3

【spark】模型持久化

 3 years ago
source link: https://www.guofei.site/2019/10/06/spark_serialize.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

【spark】模型持久化

2019年10月06日

Author: Guofei

文章归类: 1-1-算法平台 ,文章编号: 173


版权声明:本文作者是郭飞。转载随意,但需要标明原文链接,并通知本人
原文链接:https://www.guofei.site/2019/10/06/spark_serialize.html

Edit

PickleSerializer

准备你的模型

from sklearn import linear_model

lm=linear_model.LinearRegression()
x=np.random.rand(1000,1)
y=x+0.1*np.random.rand(1000,1)
lm.fit(x,y)

模型转为文本

from pyspark import PickleSerializer
ps=PickleSerializer()
model_str=ps.dumps(obj=lm)
# 是一个 byte 类型的数据,然后可以存到hive了
# 存hive略,要先用 str(model_str) 转为str,然后存hive
# 从hive读取时,用 eval() 转回 Byte 格式

文本转模型

from pyspark import PickleSerializer
ps=PickleSerializer()
model_load=ps.loads(model_str)
model_load.predict([[0.1]])

另外

这个可以序列化 iterator,不过还没试过

ps.dump_stream
ps.load_stream

MarshalSerializer

faster than PickleSerializer but supports fewer datatypes


您的支持将鼓励我继续创作!

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK