【spark】模型持久化

【spark】模型持久化

3 years ago

source link: https://www.guofei.site/2019/10/06/spark_serialize.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

2019年10月06日

Author: Guofei

文章归类: 1-1-算法平台，文章编号: 173

版权声明：本文作者是郭飞。转载随意，但需要标明原文链接，并通知本人
原文链接：https://www.guofei.site/2019/10/06/spark_serialize.html

Edit

PickleSerializer

准备你的模型

from sklearn import linear_model

lm=linear_model.LinearRegression()
x=np.random.rand(1000,1)
y=x+0.1*np.random.rand(1000,1)
lm.fit(x,y)

模型转为文本

from pyspark import PickleSerializer
ps=PickleSerializer()
model_str=ps.dumps(obj=lm)
# 是一个 byte 类型的数据，然后可以存到hive了
# 存hive略，要先用 str(model_str) 转为str，然后存hive
# 从hive读取时，用 eval() 转回 Byte 格式

文本转模型

from pyspark import PickleSerializer
ps=PickleSerializer()
model_load=ps.loads(model_str)
model_load.predict([[0.1]])

另外

这个可以序列化 iterator，不过还没试过

ps.dump_stream
ps.load_stream

MarshalSerializer

faster than PickleSerializer but supports fewer datatypes

您的支持将鼓励我继续创作！

Recommend

【spark】模型持久化

【spark】模型持久化

PickleSerializer

另外

MarshalSerializer

Recommend

How To Add Animations to React Apps with React-Lottie

做在线教育如何留住非付费用户转介绍？

《品牌出海上一个七年》供应链逻辑的巨变，全渠道之梦的挑战

深度解析——设计系统的重要性

Big Tech's new regulator could end up hurting entrepreneurs

The shape, the color, and the emotion: Angry Birds’ character design

品牌数字化创新方法论（上）——活着就是为了改变世界

留言是一门艺术

Young UX researchers: overcome your fear of numbers to stay relevant

Psilocybin may treat migraines, study finds - Big Think

About Joyk