MongoDB学习之丰富的索引

发布于今天 16:07

MongoDB的索引和MySql的索引的作用和优化要遵循的原则基本相似,MySql索引类型基本可以区分为:

单键索引 - 联合索引
主键索引(聚簇索引) - 非主键索引(非聚簇索引)

在MongoDB中除了这些基础的分类之外,还有一些特殊的索引类型,如: 数组索引 | 稀疏索引 | 地理空间索引 | TTL索引等.

为了下面方便测试我们使用脚本插入以下数据

for(var i = 0;i < 100000;i++){
    db.users.insertOne({
        username: "user"+i,
        age: Math.random() * 100,
        sex: i % 2,
        phone: 18468150001+i
    });
}

单键索引即索引的字段只有一个,是最基础的索引方式.

在集合中使用username字段,创建一个单键索引,MongoDB会自动将这个索引命名为username_1

db.users.createIndex({username:1})
'username_1'

在创建索引后查看一下使用username字段的查询计划,stage为IXSCAN代表使用使用了索引扫描

db.users.find({username:"user40001"}).explain()
{ 
   queryPlanner: 
   { 
     winningPlan: 
     { 
        ......
        stage: 'FETCH',
        inputStage: 
        { 
           stage: 'IXSCAN',
           keyPattern: { username: 1 },
           indexName: 'username_1',
           ......
        } 
     }
     rejectedPlans: [] ,
   },
   ......
   ok: 1 
}

在索引优化的原则当中,有很重要的原则就是索引要建立在基数高的的字段上,所谓基数就是一个字段上不重复数值的个数,即我们在创建users集合时年龄出现的数值是0-99那么age这个字段将会有100个不重复的数值,即age字段的基数为100,而sex这个字段只会出现0 | 1这个两个值,即sex字段的基础是2,这是一个相当低的基数,在这种情况下,索引的效率并不高并且会导致索引失效.

下面就船舰一个sex字段索引,来查询执行计划会发现,查询时是走的全表扫描,而没有走相关索引.

db.users.createIndex({sex:1})
'sex_1'

db.users.find({sex:1}).explain()
{ 
  queryPlanner: 
  { 
     ......
     winningPlan: 
     { 
        stage: 'COLLSCAN',
        filter: { sex: { '$eq': 1 } },
        direction: 'forward' 
     },
     rejectedPlans: [] 
  },
  ......
  ok: 1 
}

联合索引即索引上会有多个字段,下面使用age和sex两个字段创建一个索引

db.users.createIndex({age:1,sex:1})
'age_1_sex_1'

然后我们使用这两个字段进行一次查询,查看执行计划,顺利地走了这条索引

db.users.find({age:23,sex:1}).explain()
{ 
  queryPlanner: 
  { 
     ......
     winningPlan: 
     { 
        stage: 'FETCH',
        inputStage: 
        { 
           stage: 'IXSCAN',
           keyPattern: { age: 1, sex: 1 },
           indexName: 'age_1_sex_1',
           .......
           indexBounds: { age: [ '[23, 23]' ], sex: [ '[1, 1]' ] } 
        } 
     },
     rejectedPlans: [], 
  },
  ......
  ok: 1 
 }

数组索引就是对数组字段创建索引,也叫做多值索引,下面为了测试将users集合中的数据增加一部分数组字段.

db.users.updateOne({username:"user1"},{$set:{hobby:["唱歌","篮球","rap"]}})
......

创建数组索引并进行查看其执行计划,注意isMultiKey: true表示使用的索引是多值索引.

db.users.createIndex({hobby:1})
'hobby_1'

db.users.find({hobby:{$elemMatch:{$eq:"钓鱼"}}}).explain()
{ 
   queryPlanner: 
   { 
     ......
     winningPlan: 
     { 
        stage: 'FETCH',
        filter: { hobby: { '$elemMatch': { '$eq': '钓鱼' } } },
        inputStage: 
        { 
           stage: 'IXSCAN',
           keyPattern: { hobby: 1 },
           indexName: 'hobby_1',
           isMultiKey: true,
           multiKeyPaths: { hobby: [ 'hobby' ] },
           ......
           indexBounds: { hobby: [ '["钓鱼", "钓鱼"]' ] } } 
         },
     rejectedPlans: [] 
  },
  ......
  ok: 1 
}

数组索引相比于其它索引来说索引条目和体积必然呈倍数增加,例如平均每个文档的hobby数组的size为10,那么这个集合的hobby数组索引的条目数量将是普通索引的10倍.

联合数组索引

联合数组索引就是含有数组字段的联合索引,这种索引不支持一个索引中含有多个数组字段,即一个索引中最多能有一个数组字段,这是为了避免索引条目爆炸式增长,假设一个索引中有两个数组字段,那么这个索引条目的数量将是普通索引的n*m倍

地理空间索引

在原先的users集合上,增加一些地理信息

for(var i = 0;i < 100000;i++){
    db.users.updateOne(
    {username:"user"+i},
    {
        $set:{
            location:{
                type: "Point",
                coordinates: [100+Math.random() * 4,40+Math.random() * 3]
            }
        }
    });
}

创建一个二维空间索引

db.users.createIndex({location:"2dsphere"})
'location_2dsphere'

//查询500米内的人
db.users.find({
  location:{
    $near:{
      $geometry:{type:"Point",coordinates:[102,41.5]},
      $maxDistance:500
    }
  }
})

地理空间索引的type有很多包含Ponit(点) | LineString(线) | Polygon(多边形) 等

TTL索引

TTL的全拼是time to live,主要是用于过期数据自动删除,使用这种索引需要在文档中声明一个时间类型的字段,然后为这个字段创建TTL索引的时候还需要设置一个expireAfterSeconds过期时间单位为秒,创建完成后MongoDB会定期对集合中的数据进行检查,当出现:

当前时间−TTL索引字段时间>expireAfterSrconds 当前时间 - TTL索引字段时间 > expireAfterSrconds 当前时间−TTL索引字段时间>expireAfterSrconds

MongoDB将会自动将这些文档删除,这种索引还有以下这些要求:

TTL索引只能有一个字段,没有联合TTL索引
TTL不能用于固定集合
TTL索引是逐个遍历后,发现满足删除条件会使用delete函数删除,效率并不高

首先在我们文档上增减一个时间字段

for(var i = 90000;i < 100000;i++){
    db.users.updateOne(
    {username:"user"+i},
    {
        $set:{
            createdDate:new Date()
        }
    });
}

创建一个TTL索引并且设定过期时间为60s,待过60s后查询,会发现这些数据已经不存在

db.users.createIndex({createdDate:1},{expireAfterSeconds:60})
'createdDate_1'

另外还可以用CollMod命令更改TTL索引的过期时间

db.runCommand({
  collMod:"users",
  index:{
    keyPattern:{createdDate:1},
    expireAfterSeconds:120
  }
})

{ expireAfterSeconds_old: 60, expireAfterSeconds_new: 120, ok: 1 }

条件索引也叫部分索引(partial),只对满足条件的数据进行建立索引.

只对50岁以上的user进行建立username_1索引,查看执行计划会发现isPartial这个字段会变成true

db.users.createIndex({username:1},{partialFilterExpression:{
    age:{$gt:50}
  }})
'username_1'

db.users.find({$and:[{username:"user4"},{age:60}]}).explain()
{ 
  queryPlanner: 
  { 
     ......
     winningPlan: 
     { 
        stage: 'FETCH',
        filter: { age: { '$eq': 60 } },
        inputStage: 
        { 
           stage: 'IXSCAN',
           keyPattern: { username: 1 },
           indexName: 'username_1',
           ......
           isPartial: true,
           ......
         } 
     },
     rejectedPlans: [] 
  },
  ......
  ok: 1 
}

一般的索引会根据某个字段为整个集合创建一个索引,即使某个文档不存这个字段,那么这个索引会把这个文档的这个字段当作null建立在索引当中.

稀疏索引不会对文档中不存在的字段建立索引,如果这个字段存在但是为null时,则会创建索引.

下面给users集合中的部分数据创建稀疏索引

for(var i = 5000;i < 10000;i++){
  if(i < 9000){
    db.users.updateOne(
      {username:"user"+i},
      { $set:{email:(120000000+i)+"@qq.email"}}
    )
  }else{
    db.users.updateOne(
      {username:"user"+i},
      { $set:{email:null}}
    )
  }
}

当不建立索引使用{email:null}条件进行查询时,我们会发现查出来的文档包含没有email字段的文档

db.users.find({email:null})
{ 
  _id: ObjectId("61bdc01ba59136670f6536fd"),
  username: 'user0',
  age: 64.41483801726282,
  sex: 0,
  phone: 18468150001,
  location: 
  { 
    type: 'Point',
    coordinates: [ 101.42490900320335, 42.2576650823515 ] 
  } 
}
......

然后对email这个字段创建一个稀疏索引使用{email:null}条件进行查询,则发现查询来的文档全部是email字段存在且为null的文档.

db.users.createIndex({email:1},{sparse:true});
'email_1'

db.users.find({email:null}).hint({email:1})
{ 
  _id: ObjectId("61bdc12ca59136670f655a25"),
  username: 'user9000',
  age: 94.18397576757012,
  sex: 0,
  phone: 18468159001,
  hobby: [ '钓鱼', '乒乓球' ],
  location: 
  { 
    type: 'Point',
    coordinates: [ 101.25903151863596, 41.38450145025062 ] 
  },
  email: null 
}
......

文本索引将建立索引的文档字段先进行分词再进行检索,但是目前还不支持中文分词.

下面增加两个文本字段,创建一个联合文本索引

db.blog.insertMany([
  {title:"hello world",content:"mongodb is the best database"},
  {title:"index",content:"efficient data structure"}
])

//创建索引
db.blog.createIndex({title:"text",content:"text"})
'title_text_content_text'
//使用文本索引查询
db.blog.find({$text:{$search:"hello data"}})
{ 
  _id: ObjectId("61c092268c4037d17827d977"),
  title: 'index',
  content: 'efficient data structure' 
},
{ 
  _id: ObjectId("61c092268c4037d17827d976"),
  title: 'hello world',
  content: 'mongodb is the best database' 
}

唯一索引就是在建立索引地字段上不能出现重复元素,除了单字段唯一索引还有联合唯一索引以及数组唯一索引(即数组之间不能有元素交集 )

//对title字段创建唯一索引
db.blog.createIndex({title:1},{unique:true})
'title_1'
//插入一个已经存在的title值
db.blog.insertOne({title:"hello world",content:"mongodb is the best database"})
MongoServerError: E11000 duplicate key error collection: mock.blog index: title_1 dup key: { : "hello world" }
//查看一下执行计划,isUnique为true
db.blog.find({"title":"index"}).explain()
{ 
  queryPlanner: 
  { 
     ......
     winningPlan: 
     { 
        stage: 'FETCH',
        inputStage: 
        { 
           stage: 'IXSCAN',
           keyPattern: { title: 1 },
           indexName: 'title_1',
           isMultiKey: false,
           multiKeyPaths: { title: [] },
           isUnique: true,
           ......
         } 
     },
     rejectedPlans: [] 
  },
  .......
  ok: 1 
}

MongoDB学习之丰富的索引

MongoDB学习之丰富的索引

地理空间索引

TTL索引

Recommend

交互设计考研考什么科目？研究方向有哪些？

年度总结：未来交互设计的9大趋势（二）

为 CameraX ImageAnalysis 进行 YUV 到 RGB 的转换

5 Personas You Meet At Every Office Party (And How to Meet Their Needs!)

新晋宝爸宝妈请收下这份家长育儿心得！

Should You Invest In CryptoPunks Today?

MyLottoCoin; A Survivor of the Killer Whale of DeFi Ocean

职业参考：有些人不太适合做产品经理

it产品经理需要的技能有哪些？必备技能是什么？

年度总结：未来交互设计的9大趋势（一）

About Joyk