elasticsearch实现like查询
source link: https://wakzz.cn/2019/05/09/elasticsearch/elasticsearch%E5%AE%9E%E7%8E%B0like%E6%9F%A5%E8%AF%A2/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
elasticsearch实现like查询
elasticsearch查询需要实现类似于mysql的like查询效果,例如值为hello中国233
的记录,即可以通过中国
查询出记录,也可以通过llo
查询出记录。
但是elasticsearch的查询都是基于分词查询,hello中国233
会默认分词为hello
、中
、国
、233
。当使用hello
查询时可以匹配到该记录,但是使用llo
查询时,匹配不到该记录。
由于记录内容分词的结果的粒度不够细,导致分词查询匹配不到记录,因此解决方案是将记录内容以每个字符进行分词。即把hello中国233
分词为h
、e
、l
、o
、中
、国
、2
、3
。
elasticsearch默认没有如上效果的分词器,可以通过自定义分词器实现该效果:通过字符过滤器,将字符串的每一个字符间添加一个空格,再使用空格分词器将字符串拆分成字符。
PUT /like_search
{
"mappings": {
"like_search_type": {
"properties": {
"name": {
"type": "text"
}
}
}
}
}
PUT /like_search/like_search_type/1
{
"name": "hello中国233"
}
GET /like_search/_analyze
{
"text": [
"hello中国233"
]
}
{
"tokens": [
{
"token": "hello",
"start_offset": 0,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "中",
"start_offset": 5,
"end_offset": 6,
"type": "<IDEOGRAPHIC>",
"position": 1
},
{
"token": "国",
"start_offset": 6,
"end_offset": 7,
"type": "<IDEOGRAPHIC>",
"position": 2
},
{
"token": "233",
"start_offset": 7,
"end_offset": 10,
"type": "<NUM>",
"position": 3
}
]
}
elasticsearch默认使用standard
分词器,如下通过llo
查询不到hello中国233
的记录。
GET /like_search/_search
{
"query": {
"match_phrase": {
"name": "llo"
}
}
}
自定义分词
PUT /like_search
{
"settings": {
"analysis": {
"analyzer": {
"char_analyzer": {
"char_filter": [
"split_by_whitespace_filter"
],
"tokenizer": "whitespace"
}
},
"char_filter": {
"split_by_whitespace_filter": {
"type": "pattern_replace",
"pattern": "(.+?)",
"replacement": "$1 "
}
}
}
},
"mappings": {
"like_search_type": {
"properties": {
"name": {
"type": "text",
"analyzer": "char_analyzer"
}
}
}
}
}
PUT /like_search/like_search_type/1
{
"name": "hello中国233"
}
GET /like_search/_analyze
{
"analyzer": "char_analyzer",
"text": [
"hello中国233"
]
}
{
"tokens": [
{
"token": "h",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 0
},
{
"token": "e",
"start_offset": 1,
"end_offset": 1,
"type": "word",
"position": 1
},
{
"token": "l",
"start_offset": 2,
"end_offset": 2,
"type": "word",
"position": 2
},
{
"token": "l",
"start_offset": 3,
"end_offset": 3,
"type": "word",
"position": 3
},
{
"token": "o",
"start_offset": 4,
"end_offset": 4,
"type": "word",
"position": 4
},
{
"token": "中",
"start_offset": 5,
"end_offset": 5,
"type": "word",
"position": 5
},
{
"token": "国",
"start_offset": 6,
"end_offset": 6,
"type": "word",
"position": 6
},
{
"token": "2",
"start_offset": 7,
"end_offset": 7,
"type": "word",
"position": 7
},
{
"token": "3",
"start_offset": 8,
"end_offset": 8,
"type": "word",
"position": 8
},
{
"token": "3",
"start_offset": 9,
"end_offset": 9,
"type": "word",
"position": 9
}
]
}
使用自定义的分词器,如下通过llo
可以查询到hello中国233
的记录。
GET /like_search/_search
{
"query": {
"match_phrase": {
"name": "llo"
}
}
}
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK