Tags:

Storage Engines

Storage Engines

Storage engines write data using storage engines: MMAPv1, Wired Tiger or other.
WiredTiger is the new default storage engine.

There are data-management objects:

dbs, logical groups of collections
collections, operational units grouping docs together
indexes on collections, on fields in docs
documents, atomic units of info, fields & values

Available Storage Engines

WiredTiger is the default (since mongo v3.2)
In-Memory Storage Engine is another option
Deprecated since v4, MMAPv1 was available

on db startup

Many files are created.

for each collection wiredTiger writes an individual file
for each index wiredTiger writes an individual file
_mdb_catalog.wt, mongo db catalog
- contains catalog of all collections + indexes that the mongod instance contains
these files on startup are configurable...

Default mongodb data path

Mongo defaults its db path, where it persists data, to /data/db.
This can be altered during the mongod command, which boots up the mongod instance. mongod --dbpath /some/other/dir.

Create a Folder per db

can add a cli flag to change data storage architecture.

mongod --dbpath /my/db/path --logpath /my-db/logpath/mongodb.log --directoryperdb

Notice the --directoryperdb flag.
new folders on db creation, one folder for each database that the mongod has created.
Inside this database-specific directory will appear db-specific wiredTiger files. As an example, creating a single db with a single collection with a single index and a single document, there will be a new directory in the /data/db dir, representing the db. Inside the db's directory will be 2 wiredTiger files

one representing a collection
one representing an index

mongo hello --eval 'db.a.insert({a:1}, {writeConcern: {w:1, j:true}})'

The dbpath will reveal different data organization, including a dir called hello, the new db. This hello dir contains a collection file and an index file.

a hello directory
- the new db
- a new dir per db
  - each of these db-specific dirs will contain
    - unique files per collection
    - unique files per index

Create Nested folder per index and collection

In addition to the above --directoryperdb flag, another flag can adjust how data is stored to separate indexes from collections into two directories. Here, a new flag --wiredTigerDirectoryForIndexes will be added:

mongod --dbpath /data/db --fork --logpath /data/db/mongod.log --directoryperdb --wiredTigerDirectoryForIndexes.
Creating a new db, collection, and doc will result in a new dir structure. Perhaps a db called mytestdb:

- data
  - db
    - mytestdb
      - collection
      - index

Why do this?!

Better io and parallelization

When several hard-disks are present in the server, the dir-per-collection can enable better I/O paralellization.
Mongo creates symbolic links to mount-points on different physical drives.
On read/write to/from db, most-likely we'll be using the index & data disks, allowing for parallelization by sharing the workload across 2 data stores. Writes will write to the data AND index at the same time.

Can include compression

Data can be compressed.
This can make things faster - smaller reads/writes, at the COST of more cpu cycles during the data decompressions.

Data is also allocated in memory before writing to disk. Users can MAKE the data-write & read assure its presence on disk with the writeConcern and readConcern flags.
Checkpoints, internal processes defined by sync periods, regulate how data needs to be flushed/synced between RAM and disk.

Journaling

The journal file acts as a safeguard against data corruption.
Data stored in journal can be used to restore data when incomplete data-writes occur, or an unexpected shutdown.
Journal has its own file structure that include individual write ops.
Journal flushes are done using group commits in a compressed format.
All writes to journal file are atomic.
Users can force acknowledgement that journal has been updated with a {j:true} obj in an insert command -

db.coll.insert({a:1}, {writeConcern: {w:1, j:true}})

When Data movers from memory to disk

There are 2 ways that data gets written to disk, and moved out of memory

the db, itself, performs "checkpoints" in "sync periods" (see docs)