2
问一个数据快速离散化的问题
source link: https://www.v2ex.com/t/831306
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
我对一个一维向量 Features 进行了分箱操作,得到了每个箱体的右边界如下:NBins=[0.3, 5.4, 7.6, inf],每个箱体指定对应的一个替代值,如[3.5, 7.8, 2.4, 3.6],把向量离散成替代值,就是将所有小于 0.3 的值替换成 3.5 ,在 0.3 至 5.4 之间的替换成 7.8 等等,问一下怎样做速度比较快?
3 条回复 • 2022-01-30 10:46:15 +08:00
necomancer 18 小时 27 分钟前
用 numpy.array (pandas.DataFrame)?
a[a<0.3] =3.5
a[np.logical_and(a < 5.4, a > 0.3)] = 7.8
a[a<0.3] =3.5
a[np.logical_and(a < 5.4, a > 0.3)] = 7.8
necomancer 18 小时 24 分钟前
可以给 NBins 加上左边界然后用 for:
vals = [3.5, 7.8, ...]
NBins = [-np.inf, 0.3, 5.4..., np.inf]
for l, r, val in zip(NBins[:-1], NBins[1:], vals):
....a[np.logical_and(a > l, a< r)] = val
vals = [3.5, 7.8, ...]
NBins = [-np.inf, 0.3, 5.4..., np.inf]
for l, r, val in zip(NBins[:-1], NBins[1:], vals):
....a[np.logical_and(a > l, a< r)] = val
acone2003 1 小时 55 分钟前
谢谢 necomancer ,祝你新年快乐!我现在还没有运行你上边的代码,但是有一个疑问,比如有一个值-1 ,在 a[a<0.3] =3.5 中被替换成了 3.5 ,在随后的 a[np.logical_and(a < 5.4, a > 0.3)]=7.8 中是否又被替换成了 7.8 呢?
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK