7

MongoDB Index Building on ReplicaSet and Shard Cluster

 2 years ago
source link: https://www.percona.com/blog/mongodb-index-building-on-replicaset-and-shard-cluster/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client
MongoDB Index Building on ReplicaSet and Shard Cluster

We all know how important it is to have a proper index in the database in order to do its job effectively. We have been using indexing in our daily life to import daily tasks, without index all tasks would be completed but in a relatively long time.

The basic working of index

Imagine that we have tons of information and we want to look at something very particular and we don’t know where it is. We are going to spend a lot of time finding that particular piece of data.

If only we would have some kind of information about all the pieces of data, the job would finish very quickly because now we know where to look without spending too much time searching each and every record for one particular data.

Indexes are special data structures that store some information of records to traverse to that particular data. Indexes can be created in ascending or descending order to support efficient equality matches and range-based query operations.

Index building strategy and consideration

When we think of building an index many aspects have to be considered like key data set which is frequently being used, cardinality, write ratio in that collection, free memory, and storage.

If there are no indexes in the collection, MongoDB will do a full collection scan every time any type of query is performed which could contain millions of records. This will not only slow down the operation but will also increase the wait time for other operations too.

We can also create multiple indexes at the same time on the same collection, saving lots of time that is spent scanning the collection with the createIndexes command.

Screenshot-2022-06-04-at-10.15.44-PM-300x281.png

Limitations

It is very important to have enough memory to accommodate the working set. It is not necessary that all indexes need to fit in RAM.

Index key limit should be less than 1024 bytes till v4.0. Starting v4.2 with fcv 4.2, this limit is removed.

Same with index name, it can be up 127 bytes in db with fcv 4.0 and below. This limit is reduced with db v4.2 and fcv 4.2.

Only 64 indexes can be created in any given single collection.

Index types in MongoDB

Before seeing various index types, let’s see what the index name looks like.

The default name for an index is the concatenation of the indexed keys and each key’s direction in the index ( i.e. 1 or -1) using underscores as a separator. For example, an index created on { mobile : 1, points: -1 } has the name mobile_1_points_-1.

We can also create a custom, more human-readable name 

Shell
db.products.createIndex({ mobile: 1, points: -1 }, { name: "query for rewards points" })

Index type

MongoDB provides various types of indexes to support various data and queries.

Single field index: In a single-field index, an index is created on a single field in a document. It can traverse in both directions regardless of sort order while creating the index.

Syntax:

Shell
db.collection.createIndex({"<fieldName>" : <1 or -1>})

Here 1 represents the field specified in ascending order and -1 for descending order.

Example:

Shell
db.inventory.createIndex({productId:1});

Compound index: In a compound index, we can create indexes on multiple fields. The order of fields listed in a compound index has significance. For instance, if a compound index consists of { userid: 1, score: -1 }, the index sorts first by userid and then, within each userid value, sorts by score.

Syntax:

Shell
db.collection.createIndex({ <field1>: <1/–1>, <field2>: <1/–1>, … })

Example:

Shell
db.students.createIndex({ userid: 1, score: -1 })

Multikey index: MongoDB uses multikey indexes to index the content stored in arrays. When we create an index on a field that contains an array value, MongoDB will automatically create a separate index for every element of the array. We do not need to specify multikey type explicitly, as MongoDB automatically takes care of whether to create a multikey index if the indexed field contains an array value.

Syntax:

Shell
db.collection.createIndex({ <field1>: <1/–1>})

Example:

Shell
db.students.createIndex({ "addr.zip":1})

Geospatial index: MongoDB provides two special indexes: 2d indexes that use planar geometry when returning results and 2dsphere indexes that use spherical geometry to return results.

Syntax:

Shell
db.collection.createIndex({ <location field> : "2dsphere" })

*where the <location field> is a field whose value is either a GeoJSON object or a legacy coordinate pair.

Example:

Shell
db.places.createIndex({ loc : "2dsphere" })

Text index: With the text index type, MongoDB supports searching for string content in a collection. A collection can only have one text search index, but that index can cover multiple fields.

Syntax:

Shell
db.collection.createIndex({ <field1>: text })

Example:

Shell
db.reviews.createIndex({ comments: "text" })

Hash index: MongoDB creates the hash value of the indexed field in case of a hash base index. This type of index is mainly required where we want to have an even data distribution e.g in the case of a shard cluster environment. 

Syntax:

Shell
db.collection.createIndex({ _id: "hashed"  })

From Version 4.4 onwards, the compound Hashed Index is applicable

Properties

Unique indexes: When specified, MongoDB will reject duplicate values for the indexed field. It will not allow inserting another document containing the same key-value pair which is indexed.

Shell
> db.cust_details.createIndex({Cust_id:1},{unique:true})
"createdCollectionAutomatically" : true,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1
> db.cust_details.insert({"Cust_id":"39772","Batch":"342"})
WriteResult({ "nInserted" : 1 })
> db.cust_details.insert({"Cust_id":"39772","Batch":"452"})
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 11000,
"errmsg" : "E11000 duplicate key error collection: student.cust_details index: Cust_id_1 dup key: { Cust_id: \"39772\" }"

Partial indexes: Partial indexes only index the documents that match the filter criteria.

Shell
db.restaurants.createIndex({ cuisine: 1, name: 1 },{ partialFilterExpression: { rating: { $gt: 5 } } })
"createdCollectionAutomatically" : true,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1

TTL indexes: TTL indexes are special single-field indexes that can be used to auto delete documents from the collection over a certain period of time.

Shell
db.eventlog.createIndex({ "lastModifiedDate": 1 }, { expireAfterSeconds: 3600 })
lastModifiedDate_1

Sparse indexes: Sparse indexes only contain entries for documents that have the indexed field, even if the index field contains a null value.

Shell
db.addresses.createIndex({ "email": 1 }, { sparse: true })
email_1

Hidden indexes: Hidden indexes are not visible to the query planner and cannot be used to support a query. Apart from being hidden from the planner, hidden indexes behave like unhidden indexes.

To create a new hidden index:

Shell
db.addresses.createIndex({ pincode: 1 },{ hidden: true });

To change an existing index into a hidden one (works only with db having fcv 4.4 or greater):

Shell
db.addresses.hideIndex({ pincode: 1 }); // Specify the index key specification document
db.addresses.hideIndex( "pincode_1" );  // Specify the index name

To unhide any hidden index:

Index name or key can be used to hide the index.

Shell
db.addresses.unhideIndex({ pincode: 1 }); // Specify the index key specification document
db.addresses.unhideIndex( "pincode_1" );  // Specify the index name

Rolling index builds on replica sets

Starting from MongoDB 4.4 and later, index build happens simultaneously on all data-bearing nodes. For workloads that cannot tolerate performance issues due to index build, we can follow the approach of rolling index build strategy.

**NOTE**

Unique indexes

To create unique indexes using the following procedure, you must stop all writes to the collection during this procedure.

If you cannot stop all writes to the collection during this procedure, do not use the procedure on this page. Instead, build your unique index on the collection by issuing db.collection.createIndex() on the primary for a replica set.

Oplog size

Ensure that your oplog is large enough to permit the indexing or re-indexing operation to complete without falling too far behind to catch up.

Procedure

1. Stop one secondary and restart as a standalone on a different port number.

In this process, we are going to stop any one secondary node at a time and disable the replication parameter from the configuration file, and disableLogicalSessionCacheRefresh to true in the configuration file under the setParameter section.

Example

Shell
   bindIp: localhost,<hostname(s)|ip address(es)>
   port: 27217
#   port: 27017
#replication:
#   replSetName: myRepl
setParameter:
disableLogicalSessionCacheRefresh: true

We only need to make changes in the above settings, the rest will remain the same.

Once the above changes are done, save it and restart the process.

Shell
mongod --config <path/To/ConfigFile>
Shell
sudo systemctl start mongod

Now, the mongod process will start on port 27217 in standalone mode.

2. Build the index

Connect to the mongod instance on port 27217. Switch to the desired database and collection to create an index.

Example:

Shell
mongo –port 27217 -u ‘username’  –authenticationDatabase admin
> use student
switched to db student
> db.studentData.createIndex( { StudentID: 1 } );
"createdCollectionAutomatically" : true,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1

3. Restart the process mongod as a replica set member

After the desired index build completes, we can add the node back to replicaset member. 

Undo the configuration file change made in step one above. Restart the mongod process with the original configuration file.

Shell
   bindIp: localhost,<hostname(s)|ip address(es)>
   port: 27017
replication:
   replSetName: myRepl

After saving the configuration file, restart the process and let it become secondary.

Shell
mongod --config <path/To/ConfigFile>
Shell
sudo systemctl start mongod

4. Repeat the above procedure for the remaining secondaries

Once the ongoing node becomes secondary and there is no lag, repeat the procedure again one node at a time.

  1. Stop one secondary and restart as a standalone.
  2. Build the index.
  3. Restart the mongod process as a replica set member.

5. Index build on primary

Once index build activity finishes up in all the secondary nodes, use the same process as above to create an index on the last remaining node.

  1. Connect to the primary node and issue rs.stepDown(); Once it successfully steps down, it becomes secondary and a new primary is elected. Follow steps from one through three to build the index.
  2. Stop secondary node and restart as a standalone.
  3. Build the iondex.
  4. Restart the mongod process as a replica set member.

Rolling index builds on sharded clusters

Starting from MongoDB 4.4 and later, index build happens simultaneously on all data-bearing nodes. For workloads that cannot tolerate performance issues due to index build, we can follow the approach of rolling index build strategy.

**NOTE**

Unique indexes

To create unique indexes using the following procedure, you must stop all writes to the collection during this procedure.

If you cannot stop all writes to the collection during this procedure, do not use the procedure on this page. Instead, build your unique index on the collection by issuing db.collection.createIndex() on the primary for a replica set.

Oplog size

Ensure that your oplog is large enough to permit the indexing or re-indexing operation to complete without falling too far behind to catch up.

Procedure

1. Stop the balancer

In order to create an index in a rolling fashion in a shard cluster, it is necessary to stop the balancer so that we do not end up with an inconsistent index.

Connect to mongos instance and run sh.stopBalancer() to disable the balancer.

If there is any active migration going on, the balancer will stop only after the completion of the ongoing migration.

We can check if the balancer is stopped or not with the below command,

Shell
sh.getBalancerState()

If the balancer is stopped, the output will be false.

2. Determine the distribution of the collection

In order to build indexes in a rolling fashion, it is necessary to know on which shards the collections are residing. 

Connect to one of the mongos and refresh the cache so that we get fresh distribution information of collections in the shard for which we want to build the index.

Example:

We want to create an index in the studentData collection in the student database.

We will run the below command to get a fresh distribution of that collection.

Shell
db.adminCommand( { flushRouterConfig: "students.studentData" } );
Shell
db.records.getShardDistribution();

We will get the output of shards containing the collection :

Shell
Shard shardA at shardA/s1-mongo1.net:27018,s1-mongo2.net:27018,s1-mongo3.net:27018
data : 1KiB docs : 50 chunks : 1
estimated data per chunk : 1KiB
estimated docs per chunk : 50
Shard shardC at shardC/s3-mongo1.net:27018,s3-mongo2.net:27018,s3-mongo3.net:27018
data : 1KiB docs : 50 chunks : 1
estimated data per chunk : 1KiB
estimated docs per chunk : 50
Totals
data : 3KiB docs : 100 chunks : 2
Shard shardA contains 50% data, 50% docs in cluster, avg obj size on shard : 40B
Shard shardC contains 50% data, 50% docs in cluster, avg obj size on shard : 40B

From the above output, we can see that the students.studentData exist on shardA and shardC and we need to build indexes on shardA and shardC, respectively.

3. Build indexes on the shards that contain collection chunks

Follow the procedure below on each shard that contains the chunk of collection.

3.1. Stop one secondary and restart as a standalone

For the identified shard, stop one of the secondary nodes and make the following changes.

  • Change the port number to a different port
  • Comment out replication parameters
  • Comment out sharding parameters
  • Under section “setParameter” add skipShardingConfigurationChecks: true and disableLogicalSessionCacheRefresh: true 

Example

Shell
   bindIp: localhost,<hostname(s)|ip address(es)>
   port: 27218
#   port: 27018
#replication:
#   replSetName: shardA
#sharding:
#   clusterRole: shardsvr
setParameter:
 skipShardingConfigurationChecks: true
 disableLogicalSessionCacheRefresh: true

After saving the configuration restart the process 

Shell
mongod --config <path/To/ConfigFile>
Shell
sudo systemctl start mongod

3.2. Build the index

Connect to the mongod instance running on standalone mode and start the index build process.

Here, we are building the index in students collection on field StudentID in ascending order

Shell
> db.students.createIndex( { StudentID: 1 } )
"createdCollectionAutomatically" : true,
"numIndexesBefore" : 1,
"numIndexesAfter" : 2,
"ok" : 1

3.3. Restart the MongoDB process as replicaset node

Once the index build activity is finished, shutdown the instance and restart with the original configuration, remove the parameters skipShardingConfigurationChecks: true and disableLogicalSessionCacheRefresh: true 

Shell
   bindIp: localhost,<hostname(s)|ip address(es)>
   port: 27018
replication:
   replSetName: shardA
sharding:
   clusterRole: shardsvr

After saving the configuration restart the process 

Shell
mongod --config <path/To/ConfigFile>
Shell
sudo systemctl start mongod

3.4. Repeat the procedure for the remaining secondaries for the shard

Once the node on which index build has been completed, added back to the replicaset set, and is in sync with other nodes, repeat the above process from 3.1 to 3.3 on the remaining nodes.

3.1. Stop one secondary and restart as a standalone

3.2. Build the index

3.3. Restart the MongoDB process as replicaset node

3.5. Index build on primary

Once index build activity finishes up in all the secondary nodes, use the same process as above to create an index on the last remaining node.

  1. Connect to the primary node and issue rs.stepDown(); Once it successfully steps down, becomes secondary and a new primary is elected. Follow steps from one through three to build the index.
  2. Stop the secondary node and restart it as a standalone
  3. Build the index
  4. Restart the process mongod as a replica set member

4. Repeat for the other affected shards

Once the index build is finished for one of the identified shard, start the process outlined in step three on the next identified shard.

5. Restart the balancer

Once we are done building the index on all identified shards we can start the balancer again.

Connect to a mongos instance in the sharded cluster, and run sh.startBalancer()

Shell
sh.startBalancer()

Conclusion

Picking the right key based on an access pattern and having a good index is better than having multiple bad indexes. So, choose your index wisely.

There are also other interesting blogs on https://www.percona.com/blog/ which might be helpful to you.

I also recommend going and using Percona Server for MongoDB, which provides MongoDB enterprise-grade features without any license (as it is free). You can learn more about it in the blog MongoDB: Why Pay for Enterprise When Open Source Has You Covered?

Percona also offers some more great products for MongoDB like Percona Backup for MongoDBPercona Operator for MongoDB, and for other technologies and tools too like MySQL Software, PostgreSQL Distribution, Percona Operators, and Monitoring & Management


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK