CPU: percentage of limit

A lot of people land when trying to find out how to calculate CPU usage metric correctly in prometheus, myself included! So I'll post what I eventually ended up using as I think it's still a little difficult trying to tie together all the snippets of info here and elsewhere.

This is specific to k8s and containers that have CPU limits set.

To show CPU usage as a percentage of the limit given to the container, this is the Prometheus query we used to create nice graphs in Grafana:

sum(rate(container_cpu_usage_seconds_total{name!~".*prometheus.*", image!="", container_name!="POD"}[5m])) by (pod_name, container_name) /
sum(container_spec_cpu_quota{name!~".*prometheus.*", image!="", container_name!="POD"}/container_spec_cpu_period{name!~".*prometheus.*", image!="", container_name!="POD"}) by (pod_name, container_name)

It returns a number between 0 and 1 so format the left Y axis as percent (0.0-1.0) or multiply by 100 to get CPU usage percentage.

Note that we added some filtering here to get rid of some noise: name!~".*prometheus.*", image!="", container_name!="POD". The name!~".*prometheus.*" is just because we aren't interested in the CPU usage of all the prometheus exporters running in our k8s cluster.

(Title on this image is wrong)

CPU: show as cores with request/limit lines

Since some applications have a small request and large limit (to save money) or have an HPA, then just showing a percentage of the limit is sometimes not useful.

So what we do now is display the CPU usage in cores and then add a horizontal line for each of the request and limit. This shows more information and also shows the usage in the same metric that is used in k8s: CPU cores.

CPU usage

Legend: {{container_name}} in {{pod_name}} Query: sum(rate(container_cpu_usage_seconds_total{pod_name=~"deployment-name-[^-]*-[^-]*$", image!="", container_name!="POD"}[5m])) by (pod_name, container_name)

CPU limit

Legend: limit Query: sum(kube_pod_container_resource_limits_cpu_cores{pod=~"deployment-name-[^-]*-[^-]*$"}) by (pod)

CPU request

Legend: request Query: sum(kube_pod_container_resource_requests_cpu_cores{pod=~"deployment-name-[^-]*-[^-]*$"}) by (pod)

You will need to edit these 3 queries for your environment so that only pods from a single deployment a returned, e.g. replace deployment-name.

The pod request/limit metrics come from kube-state-metrics.

We then add 2 series overrides to hide the request and limit in the tooltip and legend:

The result looks like this:

Queries to show memory and CPU as percentage of both request and limit

Percentage of CPU request:

round(
  100 *
    sum(
      rate(container_cpu_usage_seconds_total{container_name!="POD"}[5m])
    ) by (pod, container_name, namespace, slave)
      /
    sum(
      kube_pod_container_resource_requests_cpu_cores{container_name!="POD"}
    ) by (pod, container_name, namespace, slave)
)

Percentage of CPU limit:

round(
  100 *
    sum(
      rate(container_cpu_usage_seconds_total{image!="", container_name!="POD"}[5m])
    ) by (pod_name, container_name, namespace, slave)
      /
    sum(
      container_spec_cpu_quota{image!="", container_name!="POD"} / container_spec_cpu_period{image!="", container_name!="POD"}
    ) by (pod_name, container_name, namespace, slave)
)

Percentage of memory request:

round(
  100 *
    sum(container_memory_working_set_bytes{image!="", container_name!="POD"}) by (container, pod, namespace, slave)
      /
    sum(kube_pod_container_resource_requests_memory_bytes{container_name!="POD"} > 0) by (container, pod, namespace, slave)
)

Percentage of memory limit:

round(
  100 *
    sum(container_memory_working_set_bytes{image!="", container_name!="POD"}) by (container, pod_name, namespace, slave)
      /
    sum(container_spec_memory_limit_bytes{image!="", container_name!="POD"} > 0) by (container, pod_name, namespace, slave)
)

How to display Kubernetes request and limit in Grafana / Prometheus properly

CPU: percentage of limit

CPU: show as cores with request/limit lines

CPU usage

CPU limit

CPU request

Queries to show memory and CPU as percentage of both request and limit

Recommend

Spring Boot metrics monitoring using Prometheus & Grafana

Kubernetes+Prometheus+Grafana部署笔记-KaliArch

Kubernetes+Prometheus+Grafana部署笔记-KaliArch-51CTO博客

Prometheus+Grafana打造Mysql监控平台

Kubernetes集群监控Prometheus + Grafana监控方案部署及配置-zjdevops的博客

使用Prometheus进行Kubernetes监视：AlertManager，Grafana，PushGateway（第2部分）

如何用Prometheus和Grafana监控多云Kubernetes

New in Grafana Loki 2.5: Faster queries, more log sources, so long S3 rate limit...

Beautiful Dashboards with Grafana and Prometheus - Monitoring Kubernetes Tutoria...

feat: limit struct hover display nums by Young-Flash · Pull Request #16906 · rus...

About Joyk