5

记录部署Datax、Datax-web 过程碰到的问题 - 天边ㄨ流星

 1 year ago
source link: https://www.cnblogs.com/xiaol0225/p/17491527.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

记录部署Datax、Datax-web 过程碰到的问题

我的第一篇博客

datax在网络上部署的文档有很多,这里不重复阐述,只描述过程中碰到的些许问题,记录下来。

1 ERROR RetryUtil - Exception when calling callable, 异常Msg:DataX无法连接对应的数据库,可能原因是:1) 配置的ip/port/database/jdbc错误,无法连接。2) 配置的username/password错误,鉴权失败。请和DBA确认该数据库的连接信息是否正确。
2 2023-06-19 15:10:52 [AnalysisStatistics.analysisStatisticsLog-53] java.lang.Exception: DataX无法连接对应的数据库,可能原因是:1) 配置的ip/port/database/jdbc错误,无法连接。2) 配置的username/password错误,鉴权失败。请和DBA确认该数据库的连接信息是否正确。

解决方法:

cd /opt/datax/plugin/reader/mysqlreader/libs 

这个目录下,删除掉低版本的mysql驱动版本,替换为:mysql-connector-java-8.0.*.jar

2. mysql 同步数据到hdfs上,会出现设置固定前缀的多个文件,实际文件内数据重复了,低版本的datax未解决重复问题,

问题的原因是 hdfswriter 的 writemode 只支持 append 和 nonConflict,可以下载源码找到对应的模块增加覆盖功能。

源码地址:alibaba/DataX: DataX是阿里云DataWorks数据集成的开源版本。 (github.com)

很荣幸在我碰到这个问题时,源码的 master版本解决了此问题,那么下载下来编译后安装,writeMode 增加了 truncate 功能,执行任务时若hdfs目录下存在设置的前缀文件,会先删除。

但是data-web 2.1.2 下拉选择时没有此选项,需要自行在josn文件中修改

"path": "/warehouse/jzg_ga_prod/ods/ods_t_message_template",
"fileName": "00000",
"writeMode": "truncate",
"fieldDelimiter": "\t",

3. hdfs 同步至 mysql 时的脏数据导致同步失败的问题

 UnstructuredStorageReaderUtil - CsvReader使用默认值[{"captureRawRecord":true,"columnCount":0,"comment":"#","currentRecord":-1,"delimiter":"\t","escapeMode":1,"headerCount":0,"rawRecord":"","recordDelimiter":"\u0000","safetySwitch":false,"skipEmptyRecords":true,"textQualifier":"\"","trimWhitespace":true,"useComments":false,"useTextQualifier":true,"values":[]}],csvReaderConfig值为[null]
2023-06-19 15:53:30 [AnalysisStatistics.analysisStatisticsLog-53] 2023-06-19 15:53:30.209 [0-0-0-reader] ERROR StdoutPluginCollector - 脏数据: 
2023-06-19 15:53:30 [AnalysisStatistics.analysisStatisticsLog-53] {"record":[{"byteSize":1,"index":0,"rawData":1,"type":3},{"byteSize":7,"index":1,"rawData":"注册账号验证码","type":5},{"byteSize":1,"index":2,"rawData":1,"type":3},{"byteSize":28,"index":3,"rawData":"您的验证码为:${code},如非本人操作,请忽略本短信","type":5},{"byteSize":1,"index":4,"rawData":"2","type":5},{"byteSize":1,"index":5,"rawData":"2","type":5},{"byteSize":1,"index":6,"rawData":0,"type":3},{"byteSize":1,"index":7,"rawData":"3","type":5},{"byteSize":0,"index":8,"rawData":"","type":5},{"byteSize":0,"index":9,"rawData":"","type":5}],"type":"reader","message":"No enum constant com.alibaba.datax.plugin.unstructuredstorage.reader.UnstructuredStorageReaderUtil.Type.BIGINT"}
2023-06-19 15:53:30 [AnalysisStatistics.analysisStatisticsLog-53] 2023-06-19 15:53:30.209 [0-0-0-reader] ERROR StdoutPluginCollector - 脏数据: 

原因:是datax 数据类型的问题 参考网上给出的类型对照(来源:https://www.jianshu.com/p/02e78ff57437)

1468603-20230619163921308-271961961.png

 解决方法:将 bigint、 int 改为 long 即可

1468603-20230619164155747-1344447365.png
世上从没有白费的努力,也没有碰巧的成功。很多看似撞大运的成果,往往是源于曾经一段看不到光明的奋斗。你的付出决定你的未来,你的汗水记得你的成就,你的态度决定你的一切

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK