2

elasticsearch之日期类型有点怪 - 无风听海

 1 year ago
source link: https://www.cnblogs.com/wufengtinghai/p/17121480.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

elasticsearch之日期类型有点怪

一、Date类型简介

elasticsearch通过JSON格式来承载数据的,而JSON中是没有Date对应的数据类型的,但是elasticsearch可以通过以下三种方式处理JSON承载的Date数据

  • 符合特定格式化的日期字符串;
  • 基于milliseconds-since-the-epoch的一个长整型数字;
  • 基于seconds-since-the-epoch的一个长整型数字;

索引数据的时候,elasticsearch内部会基于UTC时间,将传入的数据转化为基于milliseconds-since-the-epoch的一个长整型数字;查询数据的时候,elasticsearch内部会将查询转化为range查询;

二、测试数据准备

创建mapping,设置create_date的type为date

PUT my_date_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "create_date": {
          "type": "date" 
        }
      }
    }
  }
}

索引以下三个document

PUT my_date_index/_doc/1
{ "create_date": "2015-01-01" } 

PUT my_date_index/_doc/2
{ "create_date": "2015-01-01T12:10:30Z" } 

PUT my_date_index/_doc/3
{ "create_date": 1420070400001 }

三、日期查询的诡异之处

我们希望可以通过以下查询命中2015-01-01的记录

POST my_date_index/_search
{
  "query": {
    "term": {
      "create_date": "2015-01-01"
    }
  }
}

查看执行结果发现命中了三条数据

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_date_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "create_date" : "2015-01-01T12:10:30Z"
        }
      },
      {
        "_index" : "my_date_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "create_date" : "2015-01-01"
        }
      },
      {
        "_index" : "my_date_index",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "create_date" : 1420070400001
        }
      }
    ]
  }
}

通过以下可以看到elasticsearch内部确实将查询重写为一个范围查询create_date:[1420070400000 TO 1420156799999]

POST my_date_index/_search
{
  "profile": "true", 
  "query": {
    "term": {
      "create_date": "2015-01-01"
    }
  }
}


  {
    "id" : "[eD2KQtMGSla7jzJQBQVAfQ][my_date_index][0]",
    "searches" : [
      {
        "query" : [
          {
            "type" : "IndexOrDocValuesQuery",
            "description" : "create_date:[1420070400000 TO 1420156799999]",
            "time_in_nanos" : 2101,
            "breakdown" : {
              "score" : 0,
              "build_scorer_count" : 0,
              "match_count" : 0,
              "create_weight" : 2100,
              "next_doc" : 0,
              "match" : 0,
              "create_weight_count" : 1,
              "next_doc_count" : 0,
              "score_count" : 0,
              "build_scorer" : 0,
              "advance" : 0,
              "advance_count" : 0
            }
          }
        ],
        "rewrite_time" : 2200,
        "collector" : [
          {
            "name" : "CancellableCollector",
            "reason" : "search_cancelled",
            "time_in_nanos" : 700,
            "children" : [
              {
                "name" : "SimpleTopScoreDocCollector",
                "reason" : "search_top_hits",
                "time_in_nanos" : 200
              }
            ]
          }
        ]
      }
    ],
    "aggregations" : [ ]
  }

接下来我们来分析一下Date数据类型的term查询

我们可以看到termQuery查询直接调用了rangeQuery,并将传入的日期参数作为range的两个范围值;

DateFieldType

@Override
public Query termQuery(Object value, @Nullable QueryShardContext context) {
    Query query = rangeQuery(value, value, true, true, ShapeRelation.INTERSECTS, null, null, context);
    if (boost() != 1f) {
        query = new BoostQuery(query, boost());
    }
    return query;
}

rangeQuery中会调用parseToMilliseconds计算查询的两个范围值

DateFieldType

@Override
public Query rangeQuery(Object lowerTerm, Object upperTerm, boolean includeLower, boolean includeUpper, ShapeRelation relation,
                        @Nullable DateTimeZone timeZone, @Nullable DateMathParser forcedDateParser, QueryShardContext context) {
    failIfNotIndexed();
    if (relation == ShapeRelation.DISJOINT) {
        throw new IllegalArgumentException("Field [" + name() + "] of type [" + typeName() +
                "] does not support DISJOINT ranges");
    }
    DateMathParser parser = forcedDateParser == null
            ? dateMathParser
            : forcedDateParser;
    long l, u;
    if (lowerTerm == null) {
        l = Long.MIN_VALUE;
    } else {
        l = parseToMilliseconds(lowerTerm, !includeLower, timeZone, parser, context);
        if (includeLower == false) {
            ++l;
        }
    }
    if (upperTerm == null) {
        u = Long.MAX_VALUE;
    } else {
        u = parseToMilliseconds(upperTerm, includeUpper, timeZone, parser, context);
        if (includeUpper == false) {
            --u;
        }
    }
    Query query = LongPoint.newRangeQuery(name(), l, u);
    if (hasDocValues()) {
        Query dvQuery = SortedNumericDocValuesField.newSlowRangeQuery(name(), l, u);
        query = new IndexOrDocValuesQuery(query, dvQuery);
    }
    return query;
}

通过以下代码可以看到,左边界的值会覆盖new MutableDateTime(1970, 1, 1, 0, 0, 0, 0, DateTimeZone.UTC)对应的位置的数字,右边界的值会覆盖ew MutableDateTime(1970, 1, 1, 23, 59, 59, 999, DateTimeZone.UTC)对应位置的数字;所以我们查询中输入2015-01-01,相当于查询这一天之内的所有记录;

JodaDateMathParser

private long parseDateTime(String value, DateTimeZone timeZone, boolean roundUpIfNoTime) {
    DateTimeFormatter parser = dateTimeFormatter.parser;
    if (timeZone != null) {
        parser = parser.withZone(timeZone);
    }
    try {
        MutableDateTime date;
        // We use 01/01/1970 as a base date so that things keep working with date
        // fields that are filled with times without dates
        if (roundUpIfNoTime) {
            date = new MutableDateTime(1970, 1, 1, 23, 59, 59, 999, DateTimeZone.UTC);
        } else {
            date = new MutableDateTime(1970, 1, 1, 0, 0, 0, 0, DateTimeZone.UTC);
        }
        final int end = parser.parseInto(date, value, 0);
        if (end < 0) {
            int position = ~end;
            throw new IllegalArgumentException("Parse failure at index [" + position + "] of [" + value + "]");
        } else if (end != value.length()) {
            throw new IllegalArgumentException("Unrecognized chars at the end of [" + value + "]: [" + value.substring(end) + "]");
        }
        return date.getMillis();
    } catch (IllegalArgumentException e) {
        throw new ElasticsearchParseException("failed to parse date field [{}] with format [{}]", e, value,
            dateTimeFormatter.pattern());
    }
}

一般我们使用的日期都是精确到秒,那么只要我们将输入数据精确到秒基本上就可以命中记录;如果还是命中多个记录,那么就需要将数据的精度提高到毫秒,并且查询输入的时候也需要带上毫秒;

POST my_date_index/_search
{
  "query": {
    "term": {
      "create_date": "2015-01-01T12:10:30Z"
    }
  }
}

{
  "took" : 10,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_date_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "create_date" : "2015-01-01T12:10:30Z"
        }
      }
    ]
  }
}

四、自定义时间字符串的解析格式

elasticsearch中date默认的日期格式是表征epoch_millis的长整型数字或者符合strict_date_optional_time格式的字符串;

public static final DateFormatter DEFAULT_DATE_TIME_FORMATTER = DateFormatter.forPattern("strict_date_optional_time||epoch_millis");

strict_date_optional_time
strict限制时间字符串中的年月日部分必须是4、2、2个数字,不足部分在前边补0,例如20230123;
date_optional_time则要求字符串可以不包含时间部分,但是必须包含日期部分;

strict_date_optional_time支持的完整的时间格式如下

 date-opt-time     = date-element ['T' [time-element] [offset]]
 date-element      = std-date-element | ord-date-element | week-date-element
 std-date-element  = yyyy ['-' MM ['-' dd]]
 ord-date-element  = yyyy ['-' DDD]
 week-date-element = xxxx '-W' ww ['-' e]
 time-element      = HH [minute-element] | [fraction]
 minute-element    = ':' mm [second-element] | [fraction]
 second-element    = ':' ss [fraction]
 fraction          = ('.' | ',') digit+

我们使用2015/01/01搜索的时候,elasticsearch无法解析就会报错

POST my_date_index/_search
{
  "profile": "true", 
  "query": {
    "term": {
      "create_date": "2015/01/01"
    }
  }
}

{
  "error": {
    "root_cause": [
      {
        "type": "parse_exception",
        "reason": "failed to parse date field [2015/01/01] with format [strict_date_optional_time||epoch_millis]"
      }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
      {
        "shard": 0,
        "index": "my_date_index",
        "node": "eD2KQtMGSla7jzJQBQVAfQ",
        "reason": {
          "type": "query_shard_exception",
          "reason": "failed to create query: {\n  \"term\" : {\n    \"create_date\" : {\n      \"value\" : \"2015/01/01\",\n      \"boost\" : 1.0\n    }\n  }\n}",
          "index_uuid": "9MTRkZcMTnK8GgK9vKwUuA",
          "index": "my_date_index",
          "caused_by": {
            "type": "parse_exception",
            "reason": "failed to parse date field [2015/01/01] with format [strict_date_optional_time||epoch_millis]",
            "caused_by": {
              "type": "illegal_argument_exception",
              "reason": "Unrecognized chars at the end of [2015/01/01]: [/01/01]"
            }
          }
        }
      }
    ],
    "caused_by": {
      "type": "parse_exception",
      "reason": "failed to parse date field [2015/01/01] with format [strict_date_optional_time||epoch_millis]",
      "caused_by": {
        "type": "illegal_argument_exception",
        "reason": "Unrecognized chars at the end of [2015/01/01]: [/01/01]"
      }
    }
  },
  "status": 400
}

我们可以在mapping或者在搜索的时候指定format

POST my_date_index/_search
{
    "query": {
        "range" : {
            "create_date" : {
                "gte": "2015/01/01",
                "lte": "2015/01/01",
                "format": "yyyy/MM/dd"
            }
        }
    }
}

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my_date_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "create_date" : "2015-01-01T12:10:30Z"
        }
      },
      {
        "_index" : "my_date_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "create_date" : "2015-01-01"
        }
      },
      {
        "_index" : "my_date_index",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "create_date" : 1420070400001
        }
      }
    ]
  }
}

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK