Home
Tags:

Intro to Data Modeling

introduction to data modeling for mongodb

Some constraints with apps:

  • hardware
    • ram
    • SSD/HD size
  • data
    • size
    • security
  • app
    • network latency
  • db server
    • MAX SIZE OF 16MB
    • atomicity of updates

The Working Set of data is the total body of data that the app normally uses.

The model of data is defined by hardware & by the nature of the datasets.
Constraints and their impacts are important to identify as contributors to data models.
The model should get updated as the tech and landscape changes.

A method

Describe the workload

  • user scenarios
  • business domain experts for usability details
  • get logs and stats about the current system(s) involved
  • assemble all info in a schema, by a data modeling expert
  • guess at size the data over time - will get this wrong, but awareness of these details over time will be helpful when iterating over schema changes
  • figure out how many operations are run at a time
    • latency
    • tolerance to staleness

See a more complete write-up on describing the workload here

ID the relationships between the entities

In a relational database, this could be like actors collection, reviews collection and movies collection. In NoSQL, there is a choice: to embed or relate.

See a more complete write-up on identifying data relationships here

Apply design patterns or transformations to the current model for performance improvements

Make models more performant or clear, by applying transformations.

Modeling for simplicity or performance

Simplicity

avoid complexities that will slow down engineering speed.
Quick.
limited expectations.
CPU, Disk, I/O, Memory - these are usually simple. Few collections, leveraging sub-docs, with simple relationships: one-to-one, one-to-many, etc. Large documents, less disk-access: 1 read will be able to return a LOT of data.
Simplicity favors embedding docs.

Performance

Sharding.
Fast reads, fast writes.
Support a lot of operations.
Complex projects, perhaps done by larger teams. Performance probably leads to a lot of collections.

Simple First, Performance after

It is easier to find optimization later than it is to remove extra complexity.

Schema design patterns can apply to application types. Here is an overview of schema design patterns and how/where they could apply to application types.

Other Sections

Schema Design Anti-Patterns
Identifying the Workload
Map out Data Relationships
Review and Apply Schema Design Patters