MongoDB Index Building on ReplicaSet and Shard Cluster

We all know how important it is to have a proper index in the database in order to do its job effectively. We have been using indexing in our daily life to import daily tasks, without index all tasks would be completed but in a relatively long time.

The basic working of index

Imagine that we have tons of information and we want to look at something very particular and we don’t know where it is. We are going to spend a lot of time finding that particular piece of data.

If only we would have some kind of information about all the pieces of data, the job would finish very quickly because now we know where to look without spending too much time searching each and every record for one particular data.

Indexes are special data structures that store some information of records to traverse to that particular data. Indexes can be created in ascending or descending order to support efficient equality matches and range-based query operations.

Index building strategy and consideration

When we think of building an index many aspects have to be considered like key data set which is frequently being used, cardinality, write ratio in that collection, free memory, and storage.

If there are no indexes in the collection, MongoDB will do a full collection scan every time any type of query is performed which could contain millions of records. This will not only slow down the operation but will also increase the wait time for other operations too.

We can also create multiple indexes at the same time on the same collection, saving lots of time that is spent scanning the collection with the createIndexes command.

Screenshot-2022-06-04-at-10.15.44-PM-300x281.png

Limitations

It is very important to have enough memory to accommodate the working set. It is not necessary that all indexes need to fit in RAM.

Index key limit should be less than 1024 bytes till v4.0. Starting v4.2 with fcv 4.2, this limit is removed.

Same with index name, it can be up 127 bytes in db with fcv 4.0 and below. This limit is reduced with db v4.2 and fcv 4.2.

Only 64 indexes can be created in any given single collection.

Index types in MongoDB

Before seeing various index types, let’s see what the index name looks like.

The default name for an index is the concatenation of the indexed keys and each key’s direction in the index ( i.e. 1 or -1) using underscores as a separator. For example, an index created on { mobile : 1, points: -1 } has the name mobile_1_points_-1.

We can also create a custom, more human-readable name

Shell

db.products.createIndex({ mobile: 1, points: -1 }, { name: "query for rewards points" })

Index type

MongoDB provides various types of indexes to support various data and queries.

Single field index: In a single-field index, an index is created on a single field in a document. It can traverse in both directions regardless of sort order while creating the index.

Syntax:

Shell

db.collection.createIndex({"<fieldName>" : <1 or -1>})

Here 1 represents the field specified in ascending order and -1 for descending order.

Example:

Shell

db.inventory.createIndex({productId:1});

Compound index: In a compound index, we can create indexes on multiple fields. The order of fields listed in a compound index has significance. For instance, if a compound index consists of { userid: 1, score: -1 }, the index sorts first by userid and then, within each userid value, sorts by score.

Syntax:

Shell

db.collection.createIndex({ <field1>: <1/–1>, <field2>: <1/–1>, … })

Example:

Shell

db.students.createIndex({ userid: 1, score: -1 })

Multikey index: MongoDB uses multikey indexes to index the content stored in arrays. When we create an index on a field that contains an array value, MongoDB will automatically create a separate index for every element of the array. We do not need to specify multikey type explicitly, as MongoDB automatically takes care of whether to create a multikey index if the indexed field contains an array value.

Syntax:

Shell

db.collection.createIndex({ <field1>: <1/–1>})

Example:

Shell

db.students.createIndex({ "addr.zip":1})

Geospatial index: MongoDB provides two special indexes: 2d indexes that use planar geometry when returning results and 2dsphere indexes that use spherical geometry to return results.

Syntax:

Shell

db.collection.createIndex({ <location field> : "2dsphere" })

*where the <location field> is a field whose value is either a GeoJSON object or a legacy coordinate pair.

Example:

Shell

db.places.createIndex({ loc : "2dsphere" })

Text index: With the text index type, MongoDB supports searching for string content in a collection. A collection can only have one text search index, but that index can cover multiple fields.

Syntax:

Shell

db.collection.createIndex({ <field1>: text })

Example:

Shell

db.reviews.createIndex({ comments: "text" })

Hash index: MongoDB creates the hash value of the indexed field in case of a hash base index. This type of index is mainly required where we want to have an even data distribution e.g in the case of a shard cluster environment.

Syntax:

Shell

db.collection.createIndex({ _id: "hashed" })

From Version 4.4 onwards, the compound Hashed Index is applicable

Properties

Unique indexes: When specified, MongoDB will reject duplicate values for the indexed field. It will not allow inserting another document containing the same key-value pair which is indexed.

Shell

> db.cust_details.createIndex({Cust_id:1},{unique:true})

"createdCollectionAutomatically" : true,

"numIndexesBefore" : 1,

"numIndexesAfter" : 2,

"ok" : 1

> db.cust_details.insert({"Cust_id":"39772","Batch":"342"})

WriteResult({ "nInserted" : 1 })

> db.cust_details.insert({"Cust_id":"39772","Batch":"452"})

WriteResult({

"nInserted" : 0,

"writeError" : {

"code" : 11000,

"errmsg" : "E11000 duplicate key error collection: student.cust_details index: Cust_id_1 dup key: { Cust_id: \"39772\" }"

Partial indexes: Partial indexes only index the documents that match the filter criteria.

Shell

db.restaurants.createIndex({ cuisine: 1, name: 1 },{ partialFilterExpression: { rating: { $gt: 5 } } })

"createdCollectionAutomatically" : true,

"numIndexesBefore" : 1,

"numIndexesAfter" : 2,

"ok" : 1

TTL indexes: TTL indexes are special single-field indexes that can be used to auto delete documents from the collection over a certain period of time.

Shell

db.eventlog.createIndex({ "lastModifiedDate": 1 }, { expireAfterSeconds: 3600 })

lastModifiedDate_1

Sparse indexes: Sparse indexes only contain entries for documents that have the indexed field, even if the index field contains a null value.

Shell

db.addresses.createIndex({ "email": 1 }, { sparse: true })

email_1

Hidden indexes: Hidden indexes are not visible to the query planner and cannot be used to support a query. Apart from being hidden from the planner, hidden indexes behave like unhidden indexes.

To create a new hidden index:

Shell

db.addresses.createIndex({ pincode: 1 },{ hidden: true });

To change an existing index into a hidden one (works only with db having fcv 4.4 or greater):

Shell

db.addresses.hideIndex({ pincode: 1 }); // Specify the index key specification document

db.addresses.hideIndex( "pincode_1" ); // Specify the index name

To unhide any hidden index:

Index name or key can be used to hide the index.

Shell

db.addresses.unhideIndex({ pincode: 1 }); // Specify the index key specification document

db.addresses.unhideIndex( "pincode_1" ); // Specify the index name

Rolling index builds on replica sets

Starting from MongoDB 4.4 and later, index build happens simultaneously on all data-bearing nodes. For workloads that cannot tolerate performance issues due to index build, we can follow the approach of rolling index build strategy.

NOTE

Unique indexes

To create unique indexes using the following procedure, you must stop all writes to the collection during this procedure.

If you cannot stop all writes to the collection during this procedure, do not use the procedure on this page. Instead, build your unique index on the collection by issuing db.collection.createIndex() on the primary for a replica set.

Oplog size

Ensure that your oplog is large enough to permit the indexing or re-indexing operation to complete without falling too far behind to catch up.

Procedure

1. Stop one secondary and restart as a standalone on a different port number.

In this process, we are going to stop any one secondary node at a time and disable the replication parameter from the configuration file, and disableLogicalSessionCacheRefresh to true in the configuration file under the setParameter section.

Example

Shell

bindIp: localhost,<hostname(s)|ip address(es)>

port: 27217

# port: 27017

#replication:

# replSetName: myRepl

setParameter:

disableLogicalSessionCacheRefresh: true

We only need to make changes in the above settings, the rest will remain the same.

Once the above changes are done, save it and restart the process.

Shell

mongod --config <path/To/ConfigFile>

Shell

sudo systemctl start mongod

Now, the mongod process will start on port 27217 in standalone mode.

2. Build the index

Connect to the mongod instance on port 27217. Switch to the desired database and collection to create an index.

Example:

Shell

mongo –port 27217 -u ‘username’ –authenticationDatabase admin

> use student

switched to db student

> db.studentData.createIndex( { StudentID: 1 } );

"createdCollectionAutomatically" : true,

"numIndexesBefore" : 1,

"numIndexesAfter" : 2,

"ok" : 1

3. Restart the process mongod as a replica set member

After the desired index build completes, we can add the node back to replicaset member.

Undo the configuration file change made in step one above. Restart the mongod process with the original configuration file.

Shell

bindIp: localhost,<hostname(s)|ip address(es)>

port: 27017

replication:

replSetName: myRepl

After saving the configuration file, restart the process and let it become secondary.

Shell

mongod --config <path/To/ConfigFile>

Shell

sudo systemctl start mongod

4. Repeat the above procedure for the remaining secondaries

Once the ongoing node becomes secondary and there is no lag, repeat the procedure again one node at a time.

Stop one secondary and restart as a standalone.
Build the index.
Restart the mongod process as a replica set member.

5. Index build on primary

Once index build activity finishes up in all the secondary nodes, use the same process as above to create an index on the last remaining node.

Connect to the primary node and issue rs.stepDown(); Once it successfully steps down, it becomes secondary and a new primary is elected. Follow steps from one through three to build the index.
Stop secondary node and restart as a standalone.
Build the iondex.
Restart the mongod process as a replica set member.

Rolling index builds on sharded clusters

NOTE

Unique indexes

To create unique indexes using the following procedure, you must stop all writes to the collection during this procedure.

Oplog size

Ensure that your oplog is large enough to permit the indexing or re-indexing operation to complete without falling too far behind to catch up.

Procedure

1. Stop the balancer

In order to create an index in a rolling fashion in a shard cluster, it is necessary to stop the balancer so that we do not end up with an inconsistent index.

Connect to mongos instance and run sh.stopBalancer() to disable the balancer.

If there is any active migration going on, the balancer will stop only after the completion of the ongoing migration.

We can check if the balancer is stopped or not with the below command,

Shell

sh.getBalancerState()

If the balancer is stopped, the output will be false.

2. Determine the distribution of the collection

In order to build indexes in a rolling fashion, it is necessary to know on which shards the collections are residing.

Connect to one of the mongos and refresh the cache so that we get fresh distribution information of collections in the shard for which we want to build the index.

Example:

We want to create an index in the studentData collection in the student database.

We will run the below command to get a fresh distribution of that collection.

Shell

db.adminCommand( { flushRouterConfig: "students.studentData" } );

Shell

db.records.getShardDistribution();

We will get the output of shards containing the collection :

Shell

Shard shardA at shardA/s1-mongo1.net:27018,s1-mongo2.net:27018,s1-mongo3.net:27018

data : 1KiB docs : 50 chunks : 1

estimated data per chunk : 1KiB

estimated docs per chunk : 50

Shard shardC at shardC/s3-mongo1.net:27018,s3-mongo2.net:27018,s3-mongo3.net:27018

data : 1KiB docs : 50 chunks : 1

estimated data per chunk : 1KiB

estimated docs per chunk : 50

Totals

data : 3KiB docs : 100 chunks : 2

Shard shardA contains 50% data, 50% docs in cluster, avg obj size on shard : 40B

Shard shardC contains 50% data, 50% docs in cluster, avg obj size on shard : 40B

From the above output, we can see that the students.studentData exist on shardA and shardC and we need to build indexes on shardA and shardC, respectively.

3. Build indexes on the shards that contain collection chunks

Follow the procedure below on each shard that contains the chunk of collection.

3.1. Stop one secondary and restart as a standalone

For the identified shard, stop one of the secondary nodes and make the following changes.

Change the port number to a different port
Comment out replication parameters
Comment out sharding parameters
Under section “setParameter” add skipShardingConfigurationChecks: true and disableLogicalSessionCacheRefresh: true

Example

Shell

bindIp: localhost,<hostname(s)|ip address(es)>

port: 27218

# port: 27018

#replication:

# replSetName: shardA

#sharding:

# clusterRole: shardsvr

setParameter:

skipShardingConfigurationChecks: true

disableLogicalSessionCacheRefresh: true

After saving the configuration restart the process

Shell

mongod --config <path/To/ConfigFile>

Shell

sudo systemctl start mongod

3.2. Build the index

Connect to the mongod instance running on standalone mode and start the index build process.

Here, we are building the index in students collection on field StudentID in ascending order

Shell

> db.students.createIndex( { StudentID: 1 } )

"createdCollectionAutomatically" : true,

"numIndexesBefore" : 1,

"numIndexesAfter" : 2,

"ok" : 1

3.3. Restart the MongoDB process as replicaset node

Once the index build activity is finished, shutdown the instance and restart with the original configuration, remove the parameters skipShardingConfigurationChecks: true and disableLogicalSessionCacheRefresh: true

Shell

bindIp: localhost,<hostname(s)|ip address(es)>

port: 27018

replication:

replSetName: shardA

sharding:

clusterRole: shardsvr

After saving the configuration restart the process

Shell

mongod --config <path/To/ConfigFile>

Shell

sudo systemctl start mongod

3.4. Repeat the procedure for the remaining secondaries for the shard

Once the node on which index build has been completed, added back to the replicaset set, and is in sync with other nodes, repeat the above process from 3.1 to 3.3 on the remaining nodes.

3.1. Stop one secondary and restart as a standalone

3.2. Build the index

3.3. Restart the MongoDB process as replicaset node

3.5. Index build on primary

Once index build activity finishes up in all the secondary nodes, use the same process as above to create an index on the last remaining node.

Connect to the primary node and issue rs.stepDown(); Once it successfully steps down, becomes secondary and a new primary is elected. Follow steps from one through three to build the index.
Stop the secondary node and restart it as a standalone
Build the index
Restart the process mongod as a replica set member

4. Repeat for the other affected shards

Once the index build is finished for one of the identified shard, start the process outlined in step three on the next identified shard.

5. Restart the balancer

Once we are done building the index on all identified shards we can start the balancer again.

Connect to a mongos instance in the sharded cluster, and run sh.startBalancer()

Shell

sh.startBalancer()

Conclusion

Picking the right key based on an access pattern and having a good index is better than having multiple bad indexes. So, choose your index wisely.

There are also other interesting blogs on https://www.percona.com/blog/ which might be helpful to you.

I also recommend going and using Percona Server for MongoDB, which provides MongoDB enterprise-grade features without any license (as it is free). You can learn more about it in the blog MongoDB: Why Pay for Enterprise When Open Source Has You Covered?

Percona also offers some more great products for MongoDB like Percona Backup for MongoDB, Percona Operator for MongoDB, and for other technologies and tools too like MySQL Software, PostgreSQL Distribution, Percona Operators, and Monitoring & Management.

The basic working of index

Index building strategy and consideration

Limitations

Index types in MongoDB

Index type

Properties

Rolling index builds on replica sets

NOTE

Procedure

Rolling index builds on sharded clusters

NOTE

Procedure

Conclusion

Recommend

14 Best Early Prime Day Deals on Google Hardware: Nest Doorbell, Nest Cam, Pixel...

The fatal mistake even the best first-time authors make

What are dual-use data centers and how they drive energy efficiency

纯 CSS 实现十个还不错的 Loading 效果

机构：Micro LED 应用于 AR 智能眼镜显示器芯片产值预估至 2026 年达 4100 万美元

解压还不占内存，这5个网页小游戏你一定要收藏！ - 优设网 - UISDC

无印良品又出新海报，由原研哉工作室设计！

M2 MacBook Air Now Facing Shipping Delays, But July 15 Delivery Still Possible f...

Why is this UN Body Pushing for Developing Nations in Africa to Regulate CryptoC...

Shell to build 'Europe's largest renewable hydrogen plant'

About Joyk

MongoDB Index Building on ReplicaSet and Shard Cluster

The basic working of index

Index building strategy and consideration

Limitations

Index types in MongoDB

Index type

Properties

Rolling index builds on replica sets

**NOTE**

Procedure

Rolling index builds on sharded clusters

**NOTE**

Procedure

Conclusion

Recommend

About Joyk

NOTE

NOTE