Learn About Aggregation Performance in MongoDb
Aggregation performance
2 Categories of agg queries
- realtime
- batch processing
"Realtime" processing...
- used to provide data to apps
- performance matters
- users trigger actions, actions trigger queries
"Batch" processing
- provided for analytics
- probably run less frequently than "realtime" processing queries
- performance is probably not as important
Index usage
Be sure that agg queries use fields with indexes.
The agg forms a pipeline.
SOME agg operators CAN use indexes, some can not.
Because of the flow, once one stage does not use an index, all following stages are not allowed to use indexes.
The query optimizer will try to re-org queries for better performance where possible.
use this to inspect the stats of the query -
db.orders.agg([...aggTHingHere]), {explain: true})
What parts can use indexes?
$match
can use indexes$sort
can use indexes- put these before other transformations, for optimal performance
- put
$limit
near$sort
- and toward front of pipes
- this can make there server do a
top-K
sorting algo- the
top-k
sort lets the server only allocate memory for the final number of docs - no in-memory processing of docs!!
- the
- this can make there server do a
- and toward front of pipes
Agg pipeline parts and indexes
See operators that use indexes at the front of the line.
put sort stages at the beginning of pipes. put sort before OTHER transformations.
When doing limit, put this at the beginning of the pipe, too.
agg memory constraints
- Aggs are limited to 16MB output limit
- This does NOT apply to stuff IN the pipelines.
- Mitigate reaching this by using the
$limit
and$project
- Each stage in the pipeline has a 100MB limit
- Use indexes
- When all else fails, add
{allowDiskUse: true}
in the obj following the agg pipelinedb.coll.agg([...], {allowDiskUse: true})
- Use this as a last resort measure
- This doesn't even work in
$graphLookup
Page Tags:
database
mongodb
aggregation
performance
indexing