8

Prometheus学习笔记–Prometheus邮件报警配置 |坐而言不如起而行! 二丫讲梵

 3 years ago
source link: http://www.eryajf.net/2475.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
本文预计阅读时间 11 分钟

1、安装配置 Alertmanager

  1. $ tar xf alertmanager-0.15.2.linux-amd64.tar.gz -C /usr/local/
  2. $ mv alertmanager-0.15.2.linux-amd64/ alertmanager

2,创建启动文件

  1. $ vim /usr/lib/systemd/system/alertmanager.service
  2. 添加如下内容:
  3. [Unit]
  4. Description=alertmanager
  5. Documentation=https://github.com/prometheus/alertmanager
  6. After=network.target
  7. [Service]
  8. Type=simple
  9. User=prometheus
  10. ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alert-test.yml
  11. Restart=on-failure
  12. [Install]
  13. WantedBy=multi-user.target

Alertmanager 安装目录下默认有 alertmanager.yml 配置文件,可以创建新的配置文件,在启动时指定即可。

  1. $ cd /usr/local/alertmanager
  2. $ vim alert-test.yml
  3. global:
  4. smtp_smarthost: 'smtp.163.com:25'
  5. smtp_from: '[email protected]'
  6. smtp_auth_username: '[email protected]'
  7. smtp_auth_password: '123546' # 这里是邮箱的授权密码,不是登录密码
  8. smtp_require_tls: false
  9. templates:
  10. - '/alertmanager/template/*.tmpl'
  11. route:
  12. group_by: ['alertname', 'cluster', 'service']
  13. group_wait: 30s
  14. group_interval: 5m
  15. repeat_interval: 10m
  16. receiver: default-receiver
  17. receivers:
  18. - name: 'default-receiver'
  19. email_configs:
  20. - to: '[email protected]'
  21. html: ''
  22. headers: { Subject: "[WARN] 报警邮件 test" }

邮箱一开始使用的是公司的邮箱,结果在后边验证的时候,总是会报错level=error ts=2019-01-26T06:21:59.062483579Z caller=notify.go:332 component=dispatcher msg="Error on notify" err="*smtp.plainAuth failed: unencrypted connection",也在这里看了一些人踩坑的报告,试验了25、465、587端口,发现均无效果,最后改成163邮箱,直接就生效了。

  • smtp_smarthost:是用于发送邮件的邮箱的 SMTP 服务器地址+端口;
  • smtp_auth_password:是发送邮箱的授权码而不是登录密码;
  • smtp_require_tls:不设置的话默认为 true,当为 true 时会有 starttls 错误,为了简单这里设置为 false;
  • templates:指出邮件的模板路径;
  • receivers 下 html 指出邮件内容模板名,这里模板名为 “alert.html”,在模板路径中的某个文件中定义。
  • headers:为邮件标题;

3,配置告警规则。

配置 rule.yml。

  1. $ cd /usr/local/prometheus
  2. $ vim rule.yml
  3. groups:
  4. - name: alert-rules.yml
  5. rules:
  6. - alert: InstanceStatus # alert 名字
  7. expr: up{job="linux-node01"} == 0 # 判断条件
  8. for: 10s # 条件保持 10s 才会发出 alter
  9. labels: # 设置 alert 的标签
  10. severity: "critical"
  11. annotations: # alert 的其他标签,但不用于标识 alert
  12. description: 服务器 已当机超过 20s
  13. summary: 服务器 运行状态

在 prometheus.yml 中指定 rule.yml 的路径

  1. $ cat prometheus.yml
  2. # my global config
  3. global:
  4. scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  5. evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  6. # scrape_timeout is set to the global default (10s).
  7. # Alertmanager configuration
  8. alerting:
  9. alertmanagers:
  10. - static_configs:
  11. - targets:
  12. - localhost:9093 # 这里修改为 localhost
  13. # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
  14. rule_files:
  15. # - "first_rules.yml"
  16. # - "second_rules.yml"
  17. - "/usr/local/prometheus/rule.yml"
  18. # A scrape configuration containing exactly one endpoint to scrape:
  19. # Here it's Prometheus itself.
  20. scrape_configs:
  21. # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  22. - job_name: 'prometheus'
  23. # metrics_path defaults to '/metrics'
  24. # scheme defaults to 'http'.
  25. static_configs:
  26. - targets: ['localhost:9090','localhost:9100']
  27. - job_name: '111.4'
  28. scrape_interval: 5s
  29. static_configs:
  30. - targets: ['192.168.111.4:9100']

重启 Prometheus 服务:

  1. $ chown -R prometheus.prometheus /usr/local/prometheus/rule.yml
  2. $ systemctl restart prometheus

4,编写邮件模板

注意:文件后缀为 tmpl

  1. $ mkdir -pv /alertmanager/template/
  2. $ vim /alertmanager/template/alert.tmpl
  3. <table>
  4. <tr><td>报警名</td><td>开始时间</td></tr>
  5. <tr><td></td><td></td></tr>
  6. </table>

5,启动 Alertmanager

  1. $ chown -R prometheus.prometheus /usr/local/alertmanager
  2. $ systemctl daemon-reload
  3. $ systemctl start alertmanager.service
  4. $ systemctl status alertmanager.service
  5. $ ss -tnl|grep 9093

6,验证效果。

此时到管理界面可以看到如下信息:

然后停止111.4节点上的 node_exporter 服务,然后再看效果。

  1. $ systemctl stop node_exporter.service

接着邮箱应该会收到邮件:


weinxin


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK