Cursor-Like Stages
sort.
skip.
limit.
count.
A basic query
// get just the name && numberOfMoons, projecting vals
db.solarSystem
.find(
{},
{
_id: 0,
name: 1,
numberOfMoons: 1,
},
)
.pretty();
count
count the number of docs
db.solarSystem.find(
{},
{
_id: 0,
name:1,
numberOfMoons:1
}).count()
// count where the type is terrestrial planet
// the $count counts all incoming docs
db.solarSystem.aggregate([
{
$match: { type: "Terrestrial planet" }
},
{
$project: {
_id:0,
name:1,
numberOfMoons:1
}
},
{
// specify the name of the field here as "terrestrial planets"
$count: "terrestrial planets"
}
])
// should return...
{ "terrestrial planets" : 4 }
// In the terrestrial planet example above, the project is not even necessary. Could be simplified
db.solarSystem.aggregate([
{
$match: { type: "Terrestrial planet" }
},
{
$count: "terrestrial planets"
}
])
skip
Skipping docs to return
// without _sorting_ the results, the order returned is the order at which they were inserted, the `natural order`
// below, skipping the elements in the order they were inserted into the collection
db.solarSystem
.find(
{},
{
_id: 0,
name: 1,
numberOfMoons: 1,
},
)
.skip(1)
.pretty();
// same output as...
db.solarSystem.aggregate([
{
$project: {
_id: 0,
name: 1,
numberOfMoons: 1,
},
},
{ $skip: 1 },
]);
limit
limit the number of results
db.solarSystem
.find(
{},
{
_id: 0,
name: 1,
numberOfMoons: 1,
},
)
.limit(5)
.pretty();
// same output as...
db.solarSystem.aggregate([
{
$project: {
_id: 0,
name: 1,
numberOfMoons: 1,
},
},
{ $limit: 5 },
]);
sort
sort the docs
db.solarSystem.find(
{},
{
_id: 0,
name:1,
numberOfMoons:1
})
.sort({
numberOfMoons: -1
})
.pretty()
// with agg
db.solarSystem.aggregate([
{
$project:{
_id: 0,
name: 1,
numberOfMoons: 1
}
},
{
$sort: {
numberOfMoons: -1
}
}
])
// sort can operate on multiple fields in combination:
db.solarSystem.aggregate([
{
$project:{
_id: 0,
name: 1,
hasMagneticField: 1,
numberOfMoons: 1
}
},
{
$sort: {
hasMagneticField: -1,
numberOfMoons: -1
}
}
])
// will return...
{ "name" : "Jupiter", "numberOfMoons" : 67, "hasMagneticField" : true }
{ "name" : "Saturn", "numberOfMoons" : 62, "hasMagneticField" : true }
{ "name" : "Uranus", "numberOfMoons" : 27, "hasMagneticField" : true }
{ "name" : "Neptune", "numberOfMoons" : 14, "hasMagneticField" : true }
{ "name" : "Earth", "numberOfMoons" : 1, "hasMagneticField" : true }
{ "name" : "Sun", "numberOfMoons" : 0, "hasMagneticField" : true }
{ "name" : "Mercury", "numberOfMoons" : 0, "hasMagneticField" : true }
{ "name" : "Mars", "numberOfMoons" : 2, "hasMagneticField" : false }
{ "name" : "Venus", "numberOfMoons" : 0, "hasMagneticField" : false }
see that the false
magnetic fields are all under the true
magnetic fields, in descending numberOfMoons
count
SORT & MEMORY USAGE: Sort by default only can take 100mb of memory.
When sort is early in the pipeline, it can take advantage of indexes for optimization.
Sorts in agg should be paired with allowDiskUse: true
in the pipeline to accommodate larger sorts.
Complex Example I
For movies
- released in the USA
- with a
tomatoes.viewer.rating
greater than or equal to 3 ... calculate a new field callednum_favs
- that represents how many "favorites" appear in the
cast
field of the movie
- that represents how many "favorites" appear in the
- Sort the results by
num_favs
,tomatoes.viewer.rating
, andtitle
, all in descending order QUESTION: What is the title of the 25th film in the aggregation result? A: "The Heat". Blam.
myFavs = [
'Sandra Bullock',
'Tom Hanks',
'Julia Roberts',
'Kevin Spacey',
'George Clooney',
];
// my approach
db.movies.aggregate([
{
$match: {
countries: { $in: ['USA'] },
'tomatoes.viewer.rating': { $gte: 3 },
},
},
{
$project: {
_id: 0,
title: 1,
rating: '$tomatoes.viewer.rating',
num_favs: {
$size: {
$cond: {
if: {
$setIntersection: ['$cast', myFavs],
},
then: {
$setIntersection: ['$cast', myFavs],
},
else: [],
},
},
},
},
},
{
$sort: { num_favs: -1, rating: -1, title: -1 },
},
{ $limit: 25 },
]);
// another more succinct approach
db.movies.aggregate([
{
$match: {
'tomatoes.viewer.rating': { $gte: 3 },
countries: 'USA',
cast: {
$in: favorites,
},
},
},
{
$project: {
_id: 0,
title: 1,
'tomatoes.viewer.rating': 1,
num_favs: {
$size: {
$setIntersection: ['$cast', favorites],
},
},
},
},
{
$sort: {
num_favs: -1,
'tomatoes.viewer.rating': -1,
title: -1,
},
},
{
$skip: 24,
},
{
$limit: 1,
},
]);
Differences between my approach and the shorter approach
- i did not
match
on cast - i re-named the nested rating val to a parent-level val
- i did not
setIntersection
- i did an
if
case, comparing the favs to the setIntersection results - the shorter approach just returned the
setIntersection
between the$cast
file and thefavs
array
- i did an
- I returned all 25 results in order to find the 25th result
- the shorter approach skipped the first 24 results
- the shorter approach only returned 1 item
Complex Example II
Calculate an average rating for each movie in the movies
collection
- where English is an available language
- the minimum
imdb.rating
is at least 1 - the minimum
imdb.votes
is at least 1 - it was released in 1990 or after.
You'll be required to rescale (or normalize)
imdb.votes
. The formula to rescale imdb.votes and calculate normalized_rating is included as a handout.
What film has the lowest normalized_rating?
db.movies.aggregate([
{
$match: {
countries: { $in: ['USA'] },
'imdb.rating': { $gte: 1 },
'imdb.votes': { $gte: 1 },
year: { $gte: 1990 },
languages: { $in: ['English'] },
},
},
{
$project: {
_id: 0,
title: 1,
votes: '$imdb.votes',
rating: '$imdb.rating',
scaled_votes: {
$add: [
1,
{
$multiply: [
9,
{
$divide: [
{ $subtract: ['$imdb.votes', 5] },
{ $subtract: [1521105, 5] },
],
},
],
},
],
},
normailized_rating: {
$avg: ['$scaled_votes', '$imdb.rating'],
},
},
},
{
$sort: { normailized_rating: 1 },
},
{
$limit: 1,
},
]);