抛砖系列之redis监控命令 - 踩刀诗人
source link: https://www.cnblogs.com/chopper-poet/p/16819011.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
redis是一款非常流行的kv数据库,以高性能著称,其高吞吐、低延迟等特性让广大开发者趋之若鹜,每每看到别人发出的redis故障报告都让我产生一种居安思危,以史为鉴的危机感,恰逢今年十一西安烟雨不断,抽时间学习了几个redis监控命令,和大家分享一波。
redis-cli --stat【连续统计】
redis-cli --stat 默认每秒输出一条新行,其中包含有用信息和每个采集点的请求次数差异。使用此命令可以轻松了解内存使用情况、客户端连接计数以及有关已连接 Redis 数据库的各种其他统计信息。
可以使用-i修改采样频率,默认值为1秒,如下面这个命令代表每2s采集一次数据:
redis-cli --stat -i 2 ------- data ------ --------------------- load -------------------- - child - keys mem clients blocked requests connections 8890 131.89M 47 0 1705992846 (+0) 2595 8890 131.93M 47 0 1705992897 (+51) 2595 8890 131.93M 47 0 1705992954 (+57) 2595 8890 131.97M 47 0 1705992991 (+37) 2595 8890 131.89M 47 0 1705993043 (+52) 2595 8890 131.97M 47 0 1705993088 (+45) 2595 8890 132.01M 47 0 1705993122 (+34) 2595 8890 132.01M 47 0 1705993168 (+46) 2595 8890 132.01M 47 0 1705993194 (+26) 2595 8890 131.93M 47 0 1705993267 (+73) 2595 |
redis-cli --bigkeys【统计大key】
这个命令用作键空间分析器,它扫描数据集中的大键,但也提供有关数据集所包含的数据类型的信息。
# redis-cli --bigkeys # Scanning the entire keyspace to find biggest keys as well as # average sizes per key type. You can use -i 0.1 to sleep 0.1 sec # per 100 SCAN commands (not usually needed). [00.00%] Biggest hash found so far '"hash_big"' with 6 fields [00.00%] Biggest set found so far '"set_big"' with 6 members [00.00%] Biggest string found so far '"string_big"' with 979 bytes [00.00%] Biggest string found so far '"string_big_2"' with 1365 bytes -------- summary ------- Sampled 5 keys in the keyspace! Total key length in bytes is 38 (avg len 7.60) Biggest hash found '"hash_big"' has 6 fields Biggest string found '"string_big_2"' has 1365 bytes Biggest set found '"set_big"' has 6 members 0 lists with 0 items (00.00% of keys, avg size 0.00) 1 hashs with 6 fields (20.00% of keys, avg size 6.00) 3 strings with 2420 bytes (60.00% of keys, avg size 806.67) 0 streams with 0 entries (00.00% of keys, avg size 0.00) 1 sets with 6 members (20.00% of keys, avg size 6.00) 0 zsets with 0 members (00.00% of keys, avg size 0.00) |
在输出的第一部分中,将报告遇到的每个大于前一个较大key(相同类型)的新key。摘要部分提供有关 Redis 实例内数据的一般统计信息。
该程序使用 SCAN 命令,因此它可以在繁忙的服务器上执行而不会影响操作,当然也可以使用-i选项来限制每个 SCAN 命令的指定秒数部分的扫描过程。
例如,redis-cli --bigkeys -i 1 代表每次SCAN执行之后sleep 1s。
可以看到--bigkeys给出了每种数据结构的top 1 bigkey,同时给出了每种数据类型的键值个数以及平均大小。
redis-cli monitor【监控命令执行】
redis-cli monitor 1665128881.578949 [0 127.0.0.1:46046] "COMMAND" "DOCS" 1665128885.870333 [0 127.0.0.1:46046] "get" "a" 1665128891.200705 [0 127.0.0.1:46046] "set" "a" "asdfasdfasd" "asdfasdf" 1665128897.234390 [0 127.0.0.1:46046] "sadd" "test" "aaa" 1665128902.439247 [0 127.0.0.1:46046] "smembers" "test" 1665128906.257225 [0 127.0.0.1:46046] "smembers" "test" 1665128910.073980 [0 127.0.0.1:46046] "smembers" "test" 1665128914.688753 [0 127.0.0.1:46046] "hget" "all" "hello" 1665128918.006031 [0 127.0.0.1:46046] "hget" "all" "hello" |
可以看到目前smembers和hget命令执行的比较频繁,可能是异常流量导致,需要引起我们的注意了。
更方便的是redis-cli monitor可以和管道配合使用,比如redis-cli monitor | grep goods_test_001
redis-cli monitor |grep goods_test_001 1665129150.063322 [0 127.0.0.1:46046] "get" "goods_test_001" 1665129150.935202 [0 127.0.0.1:46046] "get" "goods_test_001" 1665129151.486148 [0 127.0.0.1:46046] "get" "goods_test_001" 1665129152.012097 [0 127.0.0.1:46046] "get" "goods_test_001" 1665129152.550077 [0 127.0.0.1:46046] "get" "goods_test_001" 1665129153.059130 [0 127.0.0.1:46046] "get" "goods_test_001" 1665129153.595023 [0 127.0.0.1:46046] "get" "goods_test_001" 1665129154.166608 [0 127.0.0.1:46046] "get" "goods_test_001" 1665129154.687753 [0 127.0.0.1:46046] "get" "goods_test_001" 1665129155.204012 [0 127.0.0.1:46046] "get" "goods_test_001" |
结合grep goods_test_001可以发现goods_test_001这个key当前有大量的读请求。
Pub/sub mode【发布订阅模式】
redis-cli可以用来发布/订阅消息,如果你的系统中使用了redis的发布订阅功能,可以使用redis-cli的这一特性来进行一些调试工作。
比如,使用redis-cli发布一条消息到mychannel
redis-cli publish mychannel helloworld |
同样的,使用redis-cli订阅mychannel发来的消息
redis-cli subscribe mychannel Reading messages... (press Ctrl-C to quit) 1) "subscribe" 2) "mychannel" 3) (integer) 1 1) "message" 2) "mychannel" 3) "helloworld" |
Monitoring the latency of Redis instances【监控延迟】
redis-cli提供了多种工具帮助我们发现延迟,涉及的指标有最小值、最大值、平均值、延迟分布情况等。
基本的延迟检查工具是redis-cli --latency。使用--latency,redis-cli 运行一个循环,以每秒100次的速度向redis发送PING命令,并测量收到回复的时间,统计信息在控制台中实时更新。
# redis-cli --latency min: 0, max: 3, avg: 0.28 (536 samples) |
统计数据以毫秒为单位,上面的测试一共发了536个PING命令,最小响应时间为0毫秒(0不代表没有延迟,只是说毫秒统计不到),最大为3毫秒,平均值为0.28毫秒。
有时我们更希望看到redis延迟变化的趋势,这时--latency-history就可以派上用场,它的工作机制和--latency相同,只是每15秒(默认)重新开启一个测试会话。
redis-cli --latency-history min: 0, max: 7, avg: 0.25 (1432 samples) -- 15.00 seconds range min: 0, max: 1, avg: 0.24 (1435 samples) -- 15.00 seconds range min: 0, max: 15, avg: 0.27 (1429 samples) -- 15.01 seconds range min: 0, max: 5, avg: 0.28 (1431 samples) -- 15.01 seconds range min: 0, max: 5007, avg: 7.71 (839 samples) -- 15.01 seconds range min: 1, max: 18, avg: 3.58 (1092 samples) -- 15.01 seconds range min: 0, max: 13, avg: 3.56 (1093 samples) -- 15.01 seconds range min: 1, max: 15, avg: 3.61 (1090 samples) -- 15.00 seconds range min: 1, max: 17, avg: 3.60 (1091 samples) -- 15.01 seconds range min: 0, max: 26, avg: 2.57 (1178 samples) -- 15.00 seconds range |
可以看到,上面每隔15秒输出一组数据,在第5个15秒开始时耗时明显增加。
内部还实现了另一个非比寻常的延迟检测工具,它不检查 Redis 实例的延迟,而是检查运行的计算机的延迟,此延迟是内核计划程序、虚拟机管理程序(如果是虚拟化实例)等所固有的。
redis称之为内在延迟,因为它对程序员来说基本上是不透明的,如果 redis 实例具有高延迟,检查其他因素之外,还值得检查内核本身的延迟。
通过测量内在延迟,我们就知道这是基准,redis 无法超越内核,使用redis-cli --intrinsic-latency <持续时间>开启测试,持续时间5秒。
redis-cli --intrinsic-latency 5 Max latency so far: 1 microseconds. Max latency so far: 16 microseconds. Max latency so far: 70 microseconds. Max latency so far: 109 microseconds. Max latency so far: 145 microseconds. Max latency so far: 205 microseconds. Max latency so far: 283 microseconds. Max latency so far: 363 microseconds. Max latency so far: 2507 microseconds. Max latency so far: 4541 microseconds. 100063828 total runs (avg latency: 0.0500 microseconds / 49.97 nanoseconds per run). Worst run took 90878x longer than the average latency. # redis-cli --intrinsic-latency 5 Max latency so far: 1 microseconds. Max latency so far: 39 microseconds. Max latency so far: 41 microseconds. Max latency so far: 45 microseconds. Max latency so far: 62 microseconds. Max latency so far: 8839 microseconds. Max latency so far: 9357 microseconds. Max latency so far: 10310 microseconds. Max latency so far: 10322 microseconds. Max latency so far: 10573 microseconds. Max latency so far: 10682 microseconds. Max latency so far: 11177 microseconds. Max latency so far: 11514 microseconds. 35539207 total runs (avg latency: 0.1407 microseconds / 140.69 nanoseconds per run). Worst run took 81840x longer than the average latency. |
注意:--intrinsic-latency只能在redis实例所在机器运行。
从上面的输出可以看到内核的最大延迟达到了11514微秒(115毫秒左右),也从侧面说明执行redis命令的最大延迟起码在115毫秒之上。
Replica mode【副本模式】
redis-cli --replica sending REPLCONF capa eof sending REPLCONF rdb-filter-only SYNC with master, discarding bytes of bulk transfer until EOF marker... SYNC done after 211 bytes. Logging commands from master. sending REPLCONF ACK 0 "ping" "SELECT" , "0" "set" , "a" , "b" "hset" , "hash" , "name" , "jack" |
可以看到主节点上执行了set,hset等指令,命令行实时输出。
如果你正在开发一个跨机房同步的redis同步工具,当你的从节点未按预期收到指令时,就可以使用这一命令做一些调试和诊断,为了方便理解,我放一张老东家自研的redis跨机房同步工具流程图。
Performing an LRU simulation【模拟LRU访问】
该工具使用80/20法则来执行 GET 、SET操作 ,意味着 20% 的key将在 80% 的次数内被请求,这符合一般缓存场景中的请求分布。
我们假设给redis分配的内存为10兆,内存驱逐策略为allkeys-lru,预期有100万个key,期望命中率是90%,测试一下看是否符合预期:
# 设置最大内存10兆 config set maxmemory 10MB # lru-test redis-cli --lru-test 1000000 119250 Gets/sec | Hits: 43654 (36.61%) | Misses: 75596 (63.39%) 125250 Gets/sec | Hits: 46002 (36.73%) | Misses: 79248 (63.27%) 127500 Gets/sec | Hits: 46860 (36.75%) | Misses: 80640 (63.25%) 122500 Gets/sec | Hits: 45228 (36.92%) | Misses: 77272 (63.08%) 126750 Gets/sec | Hits: 46623 (36.78%) | Misses: 80127 (63.22%) 125250 Gets/sec | Hits: 46150 (36.85%) | Misses: 79100 (63.15%) 120000 Gets/sec | Hits: 43962 (36.63%) | Misses: 76038 (63.37%) 121000 Gets/sec | Hits: 44630 (36.88%) | Misses: 76370 (63.12%) 123250 Gets/sec | Hits: 45616 (37.01%) | Misses: 77634 (62.99%) |
命中率明显不符合预期,36%离90%相差甚远,我们将maxmemory扩大一倍接着测试
# 设置最大内存20兆 config set maxmemory 20MB # lru-test redis-cli --lru-test 1000000 134500 Gets/sec | Hits: 65181 (48.46%) | Misses: 69319 (51.54%) 133500 Gets/sec | Hits: 86515 (64.81%) | Misses: 46985 (35.19%) 133000 Gets/sec | Hits: 98930 (74.38%) | Misses: 34070 (25.62%) 123500 Gets/sec | Hits: 95223 (77.10%) | Misses: 28277 (22.90%) 122000 Gets/sec | Hits: 94237 (77.24%) | Misses: 27763 (22.76%) 122250 Gets/sec | Hits: 94430 (77.24%) | Misses: 27820 (22.76%) 122500 Gets/sec | Hits: 94564 (77.20%) | Misses: 27936 (22.80%) 124000 Gets/sec | Hits: 95517 (77.03%) | Misses: 28483 (22.97%) 125000 Gets/sec | Hits: 96723 (77.38%) | Misses: 28277 (22.62%) 129000 Gets/sec | Hits: 99839 (77.39%) | Misses: 29161 (22.61%) |
内存增加一倍以后命中率达到了77%左右,继续调整maxmemory直到符合预期。
redis-cli --lru-test切记不要在生产环境使用,会给服务器带来较大压力;
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK