Home

Stateful Sets Are Similar To Deployments

Deployments can manage replicas of pods.
Stateful sets can too. Stateful sets, though, "maintain a sticky identifier" for each of the pods that the set "watches over".

Explanation Through Diagrams

Starting With A Databse

Starting small, consider a db being required in a "Full-Stack" application. Here, a DB, lets say mongoDB for those JS and application-first data architecture fans -

Server
Database Instance

Add Servers For Higher Availability

For more reliability, replication is required for DB resilliancy. New Servers with DBs get setup, and Mongo Replica Sets get introduced (this is a "high-level" example here.):

  • application-to-db communication go to the now "master" node
  • the "master" node gets replicated to 2 "slave" instances, where the data is cloned

Replica 2
Database Instance
Replica 1
Database Instance
Master DB Instance
Database Instance
Application Traffic

The Deployment Order And Instructions Are Critical

In Db replicas, the order of how this whole thing gets built really matters. This is a db-specific detail, not explicitly about K8s or Stateful sets.

DB With Replicas In K8s Require More Specific and Critical Config

The steps for the deployments, here, need to fit the db requirements.
Also, the "ephemeral" nature of kubernetes, without stateful sets, means that any pod + node can be destroyed and recreated without K8s really "caring" about the nodes + pods. This ephemeral nature goes against the goals of a db replication, because the replicated db instances are critical to the success of a high-availability db setup.

May Not Be Needed

"If an application doesn't require any stable identifiers or ordered deployment, deletion, or scaling, you should deploy your application using a workload object that provides a set of stateless replicas". Go to Deployments or ReplicaSets instead.

Consider Stateful Sets For Something Like DB Replicaion

Consider starting with a single server.

  • install a db on the server and get it working
  • to withstand failures, replica sets may get deployed onto several new servers

Build a few new servers.

  • a master server dataset gets replicated to other "slave" servers
  • application workloads go to the master server
  • the master db node "knows about" which slave node has the replica data on it
    • the issue here is that in K8s land, without stateful sets, this is impossible due to the ephemeral nature of pods

Stateful Set K8s Deployment Specifics

With StatefuSets, pods are...

  • created in a sequential order - master could be spun up first, then slave 1, then slave 2
  • assigned indexes, 0-first, by the stateful set
  • get reliable unique names (db-0, db-1, db-2, etc)
    • these names can be relied on!!
  • given a stable, unique dns record that any app can use to access the pod

Scaling can be helped here because new pods are cloned from previous instances.
Also, on pod termination, the latest pod is deleted first.

Definition file

This is similar to a deployment definition file.

# ss.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: db-set
  labels:
    app: db
spec:
  replicas: 3
  selector:
    matchLabels:
      app: db
  template:
    metadata:
      labels:
        app: db
      spec:
        containers:
          - name: mongodb
            image: mongodb:5
  # unique to StatefulSets, not deployments
  serviceName: mongodb-h

Storage in Stateful Sets

A unique detail in a db replica set is that writes only go to the master.
This adds some specifity to K8s:

  • a service that exposes the statefulSet can't be "normal" - the service in regular deployments balances requests across nodes
  • a "headless service" is needed here (see This other doc for more deets on headless services)

All pods share the same vol

Here, all pods in the stateful set will try to use the same volume:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: db-ss
  labels:
    app: db
spec:
  replicas: 3
  selector:
    matchLabels:
      app: db
  # pod def template section
  template:
    metadata:
      labels:
        app: db
    spec:
      containers:
      - name: mongodb
        image: mongodb:5
        volumeMounts:
        - mountPath: /data-root-dir-i-forgot
          name: db-vol
      volumes:
      - name: db-vol
        persistentVolumeClaim:
          claimName: db-vol-claim

StorageClass
Persistent Vol
Persistent Vol Claim
Pod 1
Pod 2
Pod 3

Each Pod Gets Its Own PVC + PV

Here, a stateful set can deploy pods that each reference their own pvc, where each is bound to their own pv.
Here, what looks nearly identical to a pvc definition file gets added to the statefulset def file under spec.volumeClaimTemplates. Note:

  • stateful set creates the pods chronologically
    • A Pvc is created for each pod
    • A pvc is connected to a storageClass
    • The storageClass provisions a vol on the storage provider, here google
    • the storageClass creates a pv
    • the storageClass binds the pv to the pvc
  • those steps repeat for each pod in the replica set, in order

During Pod Failure

stateful sets dont delete pvcs during pod failure/recreation. Stateful sets maintain "stable storage" for pods

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: db-ss
  labels:
    app: db
spec:
  replicas: 3
  selector:
    matchLabels:
      app: db
  # pod def template section
  template:
    metadata:
      labels:
        app: db
    spec:
      containers:
      - name: mongodb
        image: mongodb:5
        volumeMounts:
        - mountPath: /data-root-dir-i-forgot
          name: db-vol
  # like pvcs, but "templatized" for deployments
  # 1 pvc for each pod will be created
  volumeClaimTemplates:
  - metadata:
      name: db-vol
    spec:
      accessModes:
        - ReadWriteOnce
      storageClassName: google-storage
      resources:
        requests:
          storage: 500Mi

StorageClass
Persistent Vol1
Persistent Vol2
Persistent Vol3
Persistent Vol Claim1
Persistent Vol Claim2
Persistent Vol Claim3
Pod 1
Pod 2
Pod 3

Tags: