日常问题: SQL优化 - Ryan.Miao
source link: https://www.cnblogs.com/woshimrf/p/16632961.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
日常问题: SQL优化
日常开发中,除了开辟新项目,业务需求开发,一般还要做负责系统的日常运维。比如线上告警了,出bug了,必须及时修复。这天,运维反馈mysql cpu告警了,然后抓了该时间节点的慢sql日志,要开发分析解决。
拿到的慢sql日志:
# Query 1: 1.16 QPS, 1.96x concurrency, ID 0x338A0AEE1CFE3C1D at byte 7687104
# This item is included in the report because it matches --limit.
# Scores: V/M = 0.02
# Time range: 2022-08-12T16:30:00 to 2022-08-12T17:11:32
# Attribute pct total min max avg 95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count 99 2880
# Exec time 99 4893s 1s 2s 2s 2s 172ms 2s
# Lock time 99 187ms 52us 343us 64us 84us 11us 60us
# Rows sent 97 248 0 1 0.09 0.99 0.28 0
# Rows examine 96 871.46M 308.56k 311.13k 309.85k 298.06k 0 298.06k
# Query size 99 812.81k 289 289 289 289 0 289
# String:
# Hosts 10.22.9.183 (742/25%), 10.26.9.126 (730/25%)... 2 more
# Users order
# Query_time distribution
# 1us
# 10us
# 100us
# 1ms
# 10ms
# 100ms
# 1s ################################################################
# 10s+
# Tables
# SHOW TABLE STATUS LIKE 'serial_number_store'\G
# SHOW CREATE TABLE `serial_number_store`\G
# EXPLAIN /*!50100 PARTITIONS*/
select *
from serial_number_store sn
where 1=1
and company_code = '8511378117'
and warehouse_code = '851'
and sku_no = '6902952880'
and (serial_no = '5007894' or sub_serial_no = 'v')\G
查询数据库定义,发现定义了几个index
PRIMARY KEY (`ID`),
KEY `IDX_SERIAL_NUMBER_2` (`WAREHOUSE_CODE`),
KEY `IDX_SERIAL_NUMBER_3` (`SKU_NO`),
KEY `IDX_SERIAL_NUMBER_4` (`SERIAL_NO`),
KEY `IDX_SERIAL_NUMBER_5` (`SUB_SERIAL_NO`),
KEY `IDX_SERIAL_NUMBER_6` (`SKU_NAME`),
KEY `IDX_SERIAL_NUMBER_1` (`COMPANY_CODE`,`WAREHOUSE_CODE`,`SKU_NO`,`SERIAL_NO`) USING BTREE
按最左匹配原则,这条sql应该只会命中一个索引。因为or的另一半无法match。
explain发现实际执行计划:
key: IDX_SERIAL_NUMBER_3
key_len: 259
ref: const
rows: 45864
filtered: 0.95
Extra: Using where
表总数量: 13658763
or的优化技巧之一就是拆成2个可以命中索引的sql, 然后union all.
优化为union all
explain select *
from serial_number_store sn
where company_code = '9311046897'
and warehouse_code = '931DCA'
and sku_no = '6935117818696'
and serial_no = '862517054251459'
union all
select *
from serial_number_store sn
where company_code = '9311046897'
and warehouse_code = '931DCA'
and sku_no = '6935117818696'
and sub_serial_no = '862517054251459';
最终explain
key: IDX_SERIAL_NUMBER_4 IDX_SERIAL_NUMBER_5
ref: const const
rows: 1 1
filtered: 5.0 5.0
extra: using where
正常到这里,找到解决方案,就算完事了。但作为线上问题的处理,你得分析为啥以前没事,现在出问题了。
查询对应的链路追踪情况:
和猜测一致,短时间内批量查询。几乎每条sql2s多耗时。虽然是后台任务,但数据量太大导致cpu 100%.
定位实际的代码,mybatis是这么写:
<sql id="servialNumberStoreEntityParams">
<if test="id!=null and id!=''"> and ID = #{id}</if>
<if test="companyCode!=null and companyCode!=''"> and company_code = #{companyCode}</if>
<if test="warehouseCode!=null and warehouseCode!=''"> and warehouse_code = #{warehouseCode}</if>
<if test="sku!=null and sku!=''"> and sku_no = #{sku}</if>
<if test="serialNo!=null and serialNo!=''"> and (serial_no = #{serialNo} or sub_serial_no = #{serialNo})</if>
<if test="lotNum!=null and lotNum!=''"> and lot_num = #{lotNum}</if>
</sql>
这个查询片段有多个sql引用了。比如
select *
from serial_number_store sn
where 1=1
<include refid="servialNumberStoreEntityParams" />
改造成union也不是不行,比如
select *
from serial_number_store sn
where 1=1
<include refid="servialNumberStoreEntityParams" />
<if test="serialNo!=null and serialNo!=''">
and serial_no = #{serialNo}
union all
select *
from cwsp_tb_serial_number_store sn
where 1=1
<include refid="servialNumberStoreEntityParams" />
and sub_serial_no = #{serialNo}
</if>
但前面说了多个片段引用了,对应多个sql查询方法,然后这多个sql查询方法又会对应多个业务调用。那问题来了,如果改完要测的话,业务场景该怎么测?一时犹豫了,要不要再花额外的时间去搞回归测试,验证。
和运维小哥说,反正是个后台任务,先不改吧。运维看没影响到业务(没人投诉)也就不管了。
然后第二天上班又收到了告警。逃不掉了。
定位代码的时候,发现有个update
<update id="update">
update serial_number_store
<set>
<if test="companyCode!=null and companyCode!=''"> COMPANY_CODE = #{companyCode},</if>
<if test="warehouseCode!=null and warehouseCode!=''"> WAREHOUSE_CODE = #{warehouseCode},</if>
<if test="sku!=null and sku!=''"> SKU_NO = #{sku},</if>
<if test="serialNo!=null and serialNo!=''"> SERIAL_NO = #{serialNo},</if>
<if test="subSerialNo!=null and subSerialNo!=''"> SUB_SERIAL_NO = #{subSerialNo},</if>
<if test="erpno!=null and erpno!=''"> ERP_ORDER = #{erpno},</if>
<if test="docType!=null and docType!=''"> DOCTYPE = #{docType},</if>
<if test="editTime!=null and editTime!=''"> EDITTM = #{editTime},</if>
<if test="editWho!=null and editWho!=''"> EDITEMP = #{editWho},</if>
</set>
where 1=1
<include refid="servialNumberStoreEntityParams" />
</update>
这种sql,假如参数没传,岂不是全表被覆盖? 当然,也能改。前提是梳理调用链路,把这些sql引用的业务场景梳理一遍,确定入参场景,然后修改,然后再模拟场景做测试。想想整个流程,1天不知道搞不搞的定,测试上线等等,还有更长的流程。
这种在设计之初就应该做好优化设计而不是出了问题再改,但当接手古老系统的时候,开发可能换了一波又一波了,这时候除了吐槽之外,只能填坑。与此同时,自己所开发的代码,在若干时间后,也许会被另外一个人吐槽(如果自己发现的坑是自己挖的,自然不会吐槽自己)
__EOF__
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK