31

Scaling Percona Monitoring and Management (PMM)

 5 years ago
source link: https://www.tuicool.com/articles/hit/FNbENbi
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Starting withPMM 1.13,  PMM uses Prometheus 2 for metrics storage, which tends to be heaviest resource consumer of CPU and RAM.  With Prometheus 2 Performance Improvements, PMM can scale to more than 1000 monitored nodes per instance in default configuration. In this blog post we will look into PMM scaling and capacity planning—how to estimate the resources required, and what drives resource consumption.

iMbIBza.png!web

We have now tested PMM with up to 1000 nodes, using a virtualized system with 128GB of memory, 24 virtual cores, and SSD storage. We found PMM scales pretty linearly with the available memory and CPU cores, and we believe that a higher number of nodes could be supported with more powerful hardware.

What drives resource usage in PMM ?

Depending on your system configuration and workload, a single node can generate very different loads on the PMM server. The main factors that impact the performance of PMM are:

  1. Number of samples (data points) injected into PMM per second
  2. Number of distinct time series they belong to (cardinality)
  3. Number of distinct query patterns your application uses
  4. Number of queries you have on PMM, through the user interface on the API, and their complexity

These specifically can be impacted by:

  • Software version – modern database software versions expose more metrics)
  • Software configuration  – some metrics are only exposed in certain configuration
  • Workload – a large number of database objects and high concurrency will increase both the number of samples ingested and their cardinality.
  • Exporter configuration – disabling collectors can reduce amount of data collectors
  • Scrape frequency –  controlled by METRICS_RESOLUTION

All these factors together may impact resource requirements by a factor of ten or more, so do your own testing to be sure. However, the numbers in this article should serve as good general guidance as a start point for your research.

On the system supporting 1000 instances we observed the following performance:

6j2q2mM.png!web

As you can see, we have more than 2.000 scrapes/sec performed, providing almost two million samples/sec, and more than eight million active time series. These are the main numbers that define the load placed on Prometheus.

Capacity planning to scale PMM

Both CPU and memory are very important resources for PMM capacity planning. Memory is the more important as Prometheus 2 does not have good options for limiting memory consumption. If you do not have enough memory to handle your workload, then it will run out of memory and crash.

We recommend at least 2GB of memory for a production PMM Installation. A test installation with 1GB of memory is possible. However, it may not be able to monitor more than one or two nodes without running out of memory. With 2GB of memory you should be able to monitor at least five nodes without problem.

With powerful systems (8GB of more) you can have approximately eight systems per 1GB of memory, or about 15,000 samples ingested/sec per 1GB of memory.

To calculate the CPU usage resources required, allow for about 50 monitored systems per core (or 100K metrics/sec per CPU core).

One problem you’re likely to encounter if you’re running PMM with 100+ instances is the “Home Dashboard”. This becomes way too heavy with such a large number of servers. We plan to fix this issue in future releases of PMM, but for now you can work around it in two simple ways:

You can select the host, for example “pmm-server” in your home dashboard and save it, before adding a large amount of hosts to the system.

VvuiArj.png!web

Or you can make some other dashboard of your choice and set it as the home dashboard.

Summary

  • More than 1,000 monitored systems is possible per single PMM server
  • Your specific workload and configuration may significantly change the resources required
  • If deploying with 8GB or more, plan 50 systems per core, and eight systems per 1GB of RAM

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK