Blog Posts
Sharded Collection Architecture & Setup
Sharding and the Config DB
The Shard Key
Hashed Shard Keys
Shard Chunks
Balancing Shards
How the Mongos Server Routes Queries to Shards
About Sharding
In Mongo, scaling can be done horizontally.
Instead of making single machines better with more ram/hd/cpu (vertical scaling), more machines are added && the data is distributed across machines.
The data gets divided. Into shards.
Shards make up sharded clusters.
Each shard is a replica set.
Queries become complex
Mongo sets up a type of router process, the mongos
. Clients connect to this rather than the mongod process(es) itself.
Mongos
doesn't "know" anything. Mongos
uses metada on config servers to know which data is on which shard.
The data on the config server is duplicated through a replica sets.
When to Shard
Looking for indicators
Economic impact
Is it still economically viable to vertically scale?
Will adding more cpu/network/ram/disk actually help?
scenario
Current architecture: 100$/hr per server.
Next higher server costs 1k$ / hr with performance 2x performance increase.
This does not make economic sense.
Operational Impact
Will increasing disk help solve operational problems?
scenario
Wanting to increase HD for more data storage.
PRO - more data.
CON - more time to sync data, to backup data, network impact for these things. Also, more data needs more indexes, which are stored in ram, and may tangentially require more ram.
SHARDING can be helpful to allow for parallelization.
Suggestions
Individual servers should have 2-5 TB of data.
More becomes too time-consuming to operate.
Some workloads just work better in distribution: single-threaded operations, geographically distributed data, etc.
An example of a single-thread operation is an aggregation pipeline.
Sharding will help speed up aggregation pipelines.
Consider sharding when
- more than 5TB of data are on each server && operational costs are increasing
- geographically close data is required
- powerful machines are getting outgrown
dont necessarily need to consider sharding when
- disks are full
- start a new mongo project