31

使用Prometheus和Grafana监控golang服务

 4 years ago
source link: https://studygolang.com/articles/25599
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

环境

centOS 7.0

Prometheus2.14.0

Grafana6.5.2

下载安装Prometheus

wget https://github.com/prometheus/prometheus/releases/download/v2.14.0/prometheus-2.14.0.linux-386.tar.gz

tar -xavf prometheus-2.14.0.linux-386.tar.gz

启动

在解压目录里就有缺省的配置文件prometheus.yml。可以不用修改直接使用启动。

./prometheus --config.file=prometheus.yml

在浏览器中输入主机IP:9090访问就能看到Prometheus界面

MJnqIzv.png!web

时序类型

<1>Counter:计数器,数据的值持续增加或持续减少。表示的是一个持续变化趋势值,用来记录当前的数量。一般用于记录当前请求数量,错误数

<2>Gauge:计量器(类似仪表盘)。表示当前数据的一个瞬时值,改值可任意增加或减少。一般用来记录内存使用量,磁盘使用量,文件打开数量等。

<3>Histogram:柱状图。主要用于在一定范围内对数据进行采样,计算在一定范围内的分布情况,通常它采集的数据展示为直方图。一般用来记录请求时长或响应时长

<4>Summary:摘要。主要用于表示一段时间内数据采样结果。总量,而不是根据统计区间计算出来

Grafana

下载

wget https://dl.grafana.com/oss/release/grafana-6.5.2-1.x86_64.rpm

安装

sudo yum localinstall grafana-6.5.2-1.x86_64.rpm

启动

systemctl daemon-reload 
systemctl start grafana-server
systemctl status grafana-server

配置文件

配置文件在/etc/sysconfig/grafana-server

GRAFANA_USER=grafana
GRAFANA_GROUP=grafana
GRAFANA_HOME=/usr/share/grafana
LOG_DIR=/var/log/grafana
DATA_DIR=/var/lib/grafana
MAX_OPEN_FILES=10000
CONF_DIR=/etc/grafana
CONF_FILE=/etc/grafana/grafana.ini
RESTART_ON_UPGRADE=true
PLUGINS_DIR=/var/lib/grafana/plugins
PROVISIONING_CFG_DIR=/etc/grafana/provisioning
# Only used on systemd systems
PID_FILE_DIR=/var/run/grafana

访问

浏览器输入IP:3000,初次登陆帐号和密码都是admin

进入后会要求生成初次数据源(create your first data source)

MjAJJzY.png!web

jmEVFzf.png!web

生成新的dashboard

2EFjuaN.png!web

yqaYneu.png!web

zqmqaaj.png!web

VvAbyer.png!web

实例

接下来做几个实际的例子看看实际效果

测试代码请到 例子代码

Counter

例子监控rpc的数量。counter的计数是不断累加的

golang代码,关键部分

//Create a new CounterVec
rpcCounter = prometheus.NewCounterVec(
    prometheus.CounterOpts{
        Name: "rpc_counter",
        Help: "RPC counts",
    },
    []string{"api"},
)
    
//registers the provided collector     
prometheus.MustRegister(rpcCounter)

//Add the given value to counter
rpcCounter.WithLabelValues("api_bookcontent").Add(float64(rand.Int31n(50)))
rpcCounter.WithLabelValues("api_chapterlist").Add(float64(rand.Int31n(10)))

在prometheus的配置文件中添加

- job_name: 'req-monitor'
    static_configs:
      - targets: ['localhost:8082']
        labels:
          group: 'newgroup1'

重启prometheus

ps -aux | grep prometheus
kill -9 xxxx

./prometheus --config.file=prometheus.yml

编译程序(在linux下运行)

GOOS=linux go build

执行

./prometheus_rpc_http -listen-address=:8082 &

在prometheus下查看

QFvYNn2.png!web

uYRRnqY.png!web

在Grafana下新建dashboard

6RJnqaf.png!web

其中计算公式为 rate(rpc_counter[1m]) 意思是 对1minute 的rpc_counter值取平均

可以看到其中有两条线 api="api_bookcontent", api="api_chapterlist"正是我们在代码中通过rpcCounter.WithLabelValues()设置的label

Gauge

golang关键部分代码

rpcReqSize = prometheus.NewGaugeVec(
    prometheus.GaugeOpts{
        Name: "rpc_req_size",
        Help: "RPC request size",
    },
    []string{"api"},
)
    
prometheus.MustRegister(rpcReqSize)

rpcReqSize.WithLabelValues("api_bookcontent").Set(float64(rand.Int31n(8000)))
rpcReqSize.WithLabelValues("api_chapterlist").Set(float64(rand.Int31n(5000)))

在prometheus下查看

eUbUNvj.png!web

iMRBzqy.png!web

在Grafana下新建dashboard

2auAnaE.png!web

Histogram

golang关键部分代码

httpReqDurationsHistogram = prometheus.NewHistogramVec(
    prometheus.HistogramOpts{
        Name: "http_req_durations_histogram",
        Help: "http req latency distributions.",
        // 4 buckets, starting from 0.1 and adding 0.5 between each bucket
        Buckets: prometheus.LinearBuckets(0.1, 0.5, 4),
    },
    []string{"http_req_histogram"},
)


prometheus.MustRegister(httpReqDurationsHistogram)

v := rand.Float64()
httpReqDurationsHistogram.WithLabelValues("booksvc_req").Observe(1.5 * v)

prometheus下查看

m6F7Nji.png!web

可以看到我们在代码中定义了4个buckets,在图中就有对应的四个buckets数据(le="0.1",le="0.6",le="1.1",le="1.6")

在Grafana下新建dashboard

EbQjaiJ.png!web

计算公式使用rate(http_req_durations_histogram_bucket[30s])

计算30s http_req_durations_histogram_bucket的平均值

根据数值可以看到0.1秒响应的占1.3%, 0.6秒内占17.3%, 1.1秒内响应的占34.7, 1.6秒内响应的占60%

Summary

golang关键代码

rpcDurations = prometheus.NewSummaryVec(
        prometheus.SummaryOpts{
            Name:       "rpc_durations_seconds",
            Help:       "RPC latency distributions.",
            Objectives: map[float64]float64{0.5: 0.5, 0.9: 1.5, 0.99: 2.0},
        },
        []string{"service"},
    )
    
prometheus.MustRegister(rpcDurations)

v = rand.Float64()
rpcDurations.WithLabelValues("user_rpc").Observe(v)

v = 0.5 + rand.Float64()            rpcDurations.WithLabelValues("book_rpc").Observe(v)
            
v = 1.0 + rand.Float64()    rpcDurations.WithLabelValues("bookshelf_rpc").Observe(v)

在prometheus下查看

j6naMjn.png!web

在Grafana下新建dashboard

7FJnE3Y.png!web

计算公式为rate(rpc_durations_seconds_sum[1m])


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK