2

腾讯云TDSQL 备份故障处理案例

 3 years ago
source link: https://segmentfault.com/a/1190000040495217
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

腾讯云TDSQL 备份故障处理案例

发布于 8 月 10 日

一、TDSQL 备份意义:

有人说分布数据库都是多副本的,没有必要进行备份。我觉得这话只对了一半,首先分布式数据库是多副本的没错。但还是有必要进行备份的。 例如数据误删,drop table ,drop database 操作;或者需要进行历史数据的分析;又或者多重灾难,导致主备数据全部损毁。可能这些机率很小,得我们的数据经不起这样的损毁。所以不管是传统集中式数据库,还是现在比较流行的分布式数据库。为了数据安全,理应进行备份。

二、TDSQL备份介绍:

TDSQL支持物理备份和逻辑备份,它会自动从一个副本进行异步实时备份,同时能守binlog进行增量备份。利用最近一天的镜像,结合期间的binlog即完成指定时间点的恢复。 物理备份是其于xtrabackup在底层拷贝一致性数据文件+BINLOG位置。逻辑备份基于mydumper select获取一致性数据+BINLOG位置。其核心原理就是通过一致性全量数据+BINLOG位置。

三、TDSQL 备份失败案例:

通过TDSQL的物理备份对配置库 4001进行备份的时候。

提示: Error occurred,see mysqlagent log for detail.错误如下:

1.查看备份节点的日志:

[root@huyidb03 nohup]# pwd
/data/tdsql_run/4001/mysqlagent/log/nohup
[root@huyidb03 nohup]# ll
total 48
-rw-rw-rw-. 1 tdsql users 3 Dec 12 15:53 coldbackupbinlog_result_4001
-rw-rw-rw-. 1 tdsql users 34077 Dec 12 15:49 coldbackupimage_4001_2020-12-12
-rw-rw-rw-. 1 tdsql users 3 Dec 12 15:27 coldbackup_result_4001
-rw-r--r--. 1 tdsql users 3 Dec 12 15:49 manualBackupResult_4001_0000000000.log

tail -100f coldbackupimage_4001_2020-12-12
.so' (errno: 2, cannot open shared object file: No such file or directory)
innobackupex: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
xb_stream_write_data() failed.
xtrabackup: Error: write to logfile failed
innobackupex: Error writing file 'UNOPENED' (Errcode: 32 - Broken pipe)
xtrabackup: Error: xtrabackup_copy_logfile() failed.

real 0m0.459s
user 0m0.420s
sys 0m0.063s
pipe:1 141 0 1,result:-1
end--- -1 2020-12-12 15:49:23
ERROR: JAVA_HOME is not set and could not be found.

[root@huyidb03 nohup]# source /etc/profile
[root@huyidb03 nohup]# java -version
java version "1.8.0_152"
Java(TM) SE Runtime Environment (build 1.8.0_152-b16)
Java HotSpot(TM) 64-Bit Server VM (build 25.152-b16, mixed mode)

以tdsql用户重启验证

[root@huyidb03 nohup]# su - tdsql
[tdsql@huyidb03 nohup]# cd /data/tdsql_run/4001/mysqlagent/bin
[tdsql@huyidb03 bin]# ./restartreport_cgroup.sh ../conf/mysqlagent_4001.xml
stop mysqlreport success
start mysqlreport success

而此时日志继续报错,无法连接hadoop

details see: http://wiki.apache.org/hadoop...
ls: Call From huyidb03/10.85.10.53 to huyidb01:9002 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop...
put: Call From huyidb03/10.85.10.53 to huyidb01:9002 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop...
appendToFile: Call From huyidb03/10.85.10.53 to huyidb01:9002 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop...
put: Call From huyidb03/10.85.10.53 to huyidb01:9002 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop...

hadoop 节点去检查

[tdsql@huyidb01 ~]$ hadoop fs -ls /
ls: Call From huyidb01/10.85.10.51 to huyidb01:9002 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop...

再次检查hadoop:

[tdsql@huyidb01 ~]$ hadoop fs -ls /
Found 1 items
drwxr-xr-x - tdsql supergroup 0 2020-12-12 12:37 /tdsqlbackup

备份节点检查hadoop挂载情况:

[root@huyidb03 nohup]# hadoop fs -ls /
Found 1 items
drwxr-xr-x - tdsql supergroup 0 2020-12-12 12:37 /tdsqlbackup
[root@huyidb03 nohup]#

再次在备份节点上发起备份OK

image.png

四、原因分析:

初步分析是因为我们先安装的4001的实例和agent,然后才安装的hadoop,而hadoop安装好后没有重启4001的agent 让其配置生效。才导至在备份4001实例时失败,而其它实例正常。

【版权声明】本文为云贝学院胡毅原创内容,转载时必须标注文章的来源(云贝学院)、文章链接、作者等基本信息。


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK