3
【spark】模型持久化
source link: https://www.guofei.site/2019/10/06/spark_serialize.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
【spark】模型持久化
2019年10月06日Author: Guofei
文章归类: 1-1-算法平台 ,文章编号: 173
版权声明:本文作者是郭飞。转载随意,但需要标明原文链接,并通知本人
原文链接:https://www.guofei.site/2019/10/06/spark_serialize.html
PickleSerializer
准备你的模型
from sklearn import linear_model
lm=linear_model.LinearRegression()
x=np.random.rand(1000,1)
y=x+0.1*np.random.rand(1000,1)
lm.fit(x,y)
模型转为文本
from pyspark import PickleSerializer
ps=PickleSerializer()
model_str=ps.dumps(obj=lm)
# 是一个 byte 类型的数据,然后可以存到hive了
# 存hive略,要先用 str(model_str) 转为str,然后存hive
# 从hive读取时,用 eval() 转回 Byte 格式
文本转模型
from pyspark import PickleSerializer
ps=PickleSerializer()
model_load=ps.loads(model_str)
model_load.predict([[0.1]])
另外
这个可以序列化 iterator,不过还没试过
ps.dump_stream
ps.load_stream
MarshalSerializer
faster than PickleSerializer but supports fewer datatypes
您的支持将鼓励我继续创作!
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK