7

【笔记】Go语言实现中文分词

 1 year ago
source link: https://loli.fj.cn/2023/06/20/Go%E8%AF%AD%E8%A8%80%E5%AE%9E%E7%8E%B0%E4%B8%AD%E6%96%87%E5%88%86%E8%AF%8D/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

【笔记】Go语言实现中文分词

2023-06-20

Go语言通过gojieba实现中文分词

go get github.com/yanyiwu/gojieba
import "github.com/yanyiwu/gojieba"
var jieba = gojieba.NewJieba()

<str>:需要被分词的字符串

var words []string = jieba.CutAll("<str>")
分词前:
清华大学

分词后:
清华
大学
清华大学
var words []string = jieba.Cut("<str>")
分词前:
清华大学

分词后:
清华大学

分词前先添加词典

<word>:添加到词典的自定义的词

jieba.AddWord("<word>")
var words []string = jieba.Cut("<str>", true)
为添加词典的词添加权重

<num>:权重数值

jieba.AddWordEx("<word>", <num>, "")
var words []string = jieba.Cut("<str>", true)

搜索引擎模式

var words []string = jieba.CutForSearch("<str>")
分词前:
清华大学

分词后:
清华
大学
清华大学

分词后添加词性标注

var words []string = jieba.Tag("<str>")

哔哩哔哩——地鼠文档
yanyiwu/gojieba


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK