Deploying Prometheus for monitoring and stats collection
source link: http://www.linux-admins.net/2016/06/deploying-prometheus-for-monitoring-and.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Deploying Prometheus for monitoring and stats collection
Prometheus is a monitoring, alerting and statistics collector tool [1]. It provides a multi-dimensional data model (time series identified by metric name and key/value pairs) and a query capabilities similar to Graphite. The collection happens via a pull model over HTTP which makes it a good fit for microservices environment. As long as the service exposes metrics over RESTful API, Prometheus can scrape them, store them, query them and alert on them. For graphing and visualisation Prometheus integrates with Grafana and the latter can be used to create dashboards etc.
In this post I'll deploy the Prometheus server, the alerting module called Alertmanager, the Node Collector module which exports various low level server stats, Grafana as a front-end and exim4 for sending email alerts.
Since Prometheus is a Go binary, let's install the dependencies, build the server binary and make a docker container to run the service in:
[prometheus-server]$ add-apt-repository ppa:ubuntu-lxc/lxd-stable [prometheus-server]$ apt-get update && apt-get install golang [prometheus-server]$ export GOPATH=/usr/lib/go; export GOROOT=/usr/lib/go/ [prometheus-server]$ mkdir -p $GOPATH/src/github.com/prometheus [prometheus-server]$ cd $GOPATH/src/github.com/prometheus [prometheus-server]$ git clone https://github.com/prometheus/prometheus.git [prometheus-server]$ cd prometheus [prometheus-server]$ make build [prometheus-server]$ make docker [prometheus-server]$ docker images | grep prometheus prometheus master 8f24da86430e 1 minute ago 43.21 MB [prometheus-server]$
[prometheus-server]$ cd $GOPATH/src/github.com/prometheus [prometheus-server]$ git clone https://github.com/prometheus/node_exporter.git [prometheus-server]$ cd node_exporter [prometheus-server]$ make build [prometheus-server]$ make docker [prometheus-server]$ docker images | grep node-exporter node-exporter master 40a33f49d66f 1 minute ago 17 MB [prometheus-server]$ [prometheus-server]$ docker run -d -p 9100:9100 --net="host" node-exporter:master [prometheus-server]$ docker ps | grep node-exporter 8629a88bba36 node-exporter:master "/bin/node_exporter" 1 minutes ago Up 1 minutes desperate_mcclintock [prometheus-server]$ [prometheus-server]$ ss -o state listening '( sport = :9100 )' Netid Recv-Q Send-Q Local Address:Port Peer Address:Port tcp 0 128 :::9100 :::* [prometheus-server]$ [prometheus-server]$ curl localhost:9100/metrics ... # HELP node_disk_io_time_ms Milliseconds spent doing I/Os. # TYPE node_disk_io_time_ms counter node_disk_io_time_ms{device="dm-0"} 31188 node_disk_io_time_ms{device="dm-1"} 964 node_disk_io_time_ms{device="dm-10"} 788 ... [prometheus-server]$
[prometheus-server]$ mkdir -p /etc/prometheus/prometheus-data [prometheus-server]$ vim prometheus.yml global: scrape_interval: 30s evaluation_interval: 5s
scrape_configs: - job_name: prometheus target_groups: - targets: ['api-01.us-east-1.example.com:9090']
- job_name: node target_groups: - targets: ['api-01.us-east-1.example.com:9100'] [prometheus-server]$ [prometheus-server]$ docker run -d -p 9090:9090 -v /etc/prometheus/prometheus-data/prometheus.yml:/etc/prometheus/prometheus.yml prometheus:master -config.file=/etc/prometheus/prometheus.yml [prometheus-server]$ docker ps | grep prometheus 83a773b218db prometheus:master "/bin/prometheus -con" 1 minutes ago Up 1 minutes 0.0.0.0:9090->9090/tcp jovial_goldstine [prometheus-server]$
[prometheus-server]$ cat prometheus.yml global: scrape_interval: 30s evaluation_interval: 5s
scrape_configs: - job_name: prometheus target_groups: - targets: ['api-01.us-east-1.example.com:9090']
- job_name: node target_groups: - targets: ['api-01.us-east-1.example.com:9100']
- job_name: test metrics_path: / target_groups: - targets: ['api-01.us-east-1.example.com:9999'] labels: service_name: test_service [prometheus-server]$ docker kill 83a773b218db [prometheus-server]$ docker run -d -p 9090:9090 -v /etc/prometheus/prometheus-data/prometheus.yml:/etc/prometheus/prometheus.yml prometheus:master -config.file=/etc/prometheus/prometheus.yml [prometheus-server]$ echo "test_metric 1" | nc -k -l 9999 GET / HTTP/1.1 User-Agent: curl/7.35.0 Host: localhost:9999 Accept: */* ... GET / HTTP/1.1 User-Agent: curl/7.35.0 Host: localhost:9999 Accept: */* ^C [prometheus-server]$
We can see from above Prometheus GET-ing the / on regular intervals.
To query for the newly exposed metric run:
All this data is being collected and stored in a similar way to Graphite. We can use Grafana to create graphs, dashboards etc.:
[prometheus-server]$ curl -L -O https://grafanarel.s3.amazonaws.com/builds/grafana-2.5.0.linux-x64.tar.gz [prometheus-server]$ tar zxfv grafana-2.5.0.linux-x64.tar.gz && cd grafana-2.5.0 [prometheus-server]$ nohup ./bin/grafana-server web &
[prometheus-server]$ cd $GOPATH/src/github.com/prometheus [prometheus-server]$ git clone https://github.com/prometheus/alertmanager.git [prometheus-server]$ cd alertmanager [prometheus-server]$ make build [prometheus-server]$ make docker [prometheus-server]$ docker images | grep alertmanager alertmanager master b05b2acf17eb 1 minute ago 16.84 MB [prometheus-server]$ [prometheus-server]$ cat /etc/prometheus/prometheus-data/alertmanager.yml global: smtp_smarthost: 'localhost:4444' smtp_from: '[email protected]'
route: group_by: ['alertname', 'cluster', 'service'] group_wait: 30s group_interval: 1m repeat_interval: 5m receiver: team-mg-email
routes: - match_re: service_name: ^(test_service)$ receiver: team-mg-email
inhibit_rules: - source_match: severity: 'critical' target_match: severity: 'warning' equal: ['alertname', 'cluster', 'service']
receivers: - name: 'team-mg-email' email_configs: - to: '[email protected]' require_tls: false [prometheus-server]$ [prometheus-server]$ docker run -d -p 9093:9093 -v /etc/prometheus/prometheus-data/alertmanager.yml:/etc/prometheus/alertmanager.yml alertmanager:master -config.file=/etc/prometheus/alertmanager.yml [prometheus-server]$ docker ps | grep alertmanager c8e317f31c32 alertmanager:master "/bin/alertmanager -c" 1 minute ago Up 1 minute 0.0.0.0:9093->9093/tcp sharp_thompson [prometheus-server]$ ss -o state listening '( sport = :9093 )' Netid Recv-Q Send-Q Local Address:Port Peer Address:Port tcp 0 128 :::9093 :::* [prometheus-server]$
[prometheus-server]$ cat /etc/prometheus/prometheus-data/test_service.rules ALERT TestServiceDown IF up{job="test"} != 1 FOR 5s LABELS { service_name = "test_service" } ANNOTATIONS { summary = "Instance {{ $labels.instance }} down", description = "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 seconds.", }
[prometheus-server]$ cat /etc/prometheus/prometheus-data/prometheus.yml global: scrape_interval: 30s evaluation_interval: 5s
rule_files: - "/etc/prometheus/test_service.rules"
scrape_configs: - job_name: prometheus target_groups: - targets: ['api-01.us-east-1.example.com:9090']
- job_name: node target_groups: - targets: ['api-01.us-east-1.example.com:9100']
- job_name: test metrics_path: / target_groups: - targets: ['api-01.us-east-1.example.com:9999'] labels: service_name: test_service [prometheus-server]$ docker kill 83a773b218db [prometheus-server]$ docker run -d -p 9090:9090 -v /etc/prometheus/prometheus-data/prometheus.yml:/etc/prometheus/prometheus.yml -v /etc/prometheus/prometheus-data/alertmanager.conf:/etc/prometheus/alertmanager.conf -v /etc/prometheus/prometheus-data/test_service.rules:/etc/prometheus/test_service.rules prometheus:master -config.file=/etc/prometheus/prometheus.yml -alertmanager.url=http://localhost:9093 [prometheus-server]$
[prometheus-server]$ curl -s 'http://localhost:9090/api/v1/query?query=test_metric' | python -mjson.tool { "data": { "result": [ { "metric": { "__name__": "test_metric", "instance": "api-01.us-east-1.example.com:9999", "job": "test", "service_name": "test_service" }, "value": [ 1465933958.146, "1" ] } ], "resultType": "vector" }, "status": "success" } [prometheus-server]$
[prometheus-server]$ docker run -d -p 4444:25 -v /tmp/exim:/var/spool/exim4 -e PRIMARY_HOST=us-east-1.example.com -e ALLOWED_HOSTS="10.1.0.0/16" elsdoerfer/exim-sender [prometheus-server]$ docker ps | grep exim f98a1e7dc9cf elsdoerfer/exim-sender "/exim" 1 minutes ago Up 1 minute 0.0.0.0:4444->25/tcp elated_franklin [prometheus-server]$
If the service we want to monitor does not provide an API we can use probing over HTTP, HTTPS, TCP and ICMP with the Blackbox exporter [4].
Resources:
[1]. https://prometheus.io/docs/introduction/overview/
[2]. https://github.com/prometheus/node_exporter
[3]. https://github.com/prometheus/alertmanager
[4]. https://github.com/prometheus/blackbox_exporter
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK