4

Elasticsearch 导入数据

 2 years ago
source link: https://einverne.github.io/post/2022/07/elasticsearch-import-data.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Elasticsearch 既然作为一个全文检索引擎,那么自然需要将数据导入,让 Elasticsearch 去索引。

Elasticsearch(后简写为 ES) 的基本单元是文档,使用 JSON 来描述。

有很多方法可以把数据导入到 ES:

  • RESTful 接口
  • Bulk API 批量导入
  • elasticsearch-dump
  • Logstash 将收集的数据导入

Prerequisite

导入数据前要了解的知识。

  • Cluster,集群,通常由多个节点组成 ES 集群
  • Index,通常称为索引,文档的属性
  • Document,文档,JSON 格式定义的数据
  • Shard,分片,索引会水平分片
  • Replicas,ES 允许用户创建索引和分片的 Replicas

RESTful 接口导入

如果数据文件比较简单,只有单层 JSON 结构,并且小于 1MB,可以使用 POST 请求直接将数据提交到 ES。

假设有数据 accounts.json

{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"[email protected]","city":"Brogan","state":"IL"}

然后可以使用如下命令导入:

curl -u admin:my_elastic_pass -H 'Content-Type: application/json' -XPOST 'http://localhost:9200/accounts/_doc/_bulk?pretty' --data-binary @accounts.json
{
  "_index" : "accounts",
  "_id" : "_bulk",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

可以通过如下命令来查看所有索引的情况:

curl "localhost:9200/_cat/indices?v"

然后访问 Kibana,在后台就可以查询导入的内容:

kibana

这里使用官方的样例数据:

wget https://download.elastic.co/demos/kibana/gettingstarted/accounts.zip
unzip accounts.zip
curl -XPOST 'localhost:9200/bank/account/_bulk?pretty' --data-binary @accounts.json

需要注意的是如果使用 Bulk 批量导入,那么格式需要按照:

action_and_meta_data\n
optional_source\n
action_and_meta_data\n
optional_source\n
....
action_and_meta_data\n
optional_source\n

或者可以在 Kibana 后台,点击侧边栏 Dev Tools,然后在编辑框中输入:

POST bank/_bulk
{"index":{"_id":"1"}}
{"account_number":1,"balance":39225,"firstname":"Amber","lastname":"Duke","age":32,"gender":"M","address":"880 Holmes Lane","employer":"Pyrami","email":"[email protected]","city":"Brogan","state":"IL"}
{"index":{"_id":"6"}}
{"account_number":6,"balance":5686,"firstname":"Hattie","lastname":"Bond","age":36,"gender":"M","address":"671 Bristol Street","employer":"Netagy","email":"[email protected]","city":"Dante","state":"TN"}
省略...

最后点击执行。

kibana devtool bulk import

执行成功后,可以运行 curl localhost:9200/bank/_search,如果返回值中 value 是导入的条数就表示成功了。

elasticsearch-dump

Logstash

Logstash 是开源的服务器端数据处理管道,能够同时从多个来源采集数据、转换数据,然后将数据发送到 Elasticsearch 中。Logstash 的官方文档请参见:https://www.elastic.co/guide/en/logstash/current/getting-started-with-logstash.html

reference

  • <https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK