Stateful Sets Are Similar To Deployments
Deployments can manage replicas of pods.
Stateful sets can too. Stateful sets, though, "maintain a sticky identifier" for each of the pods that the set "watches over".
- Stateful Sets Are Similar To Deployments
Explanation Through Diagrams
Starting With A Databse
Starting small, consider a db being required in a "Full-Stack" application. Here, a DB, lets say mongoDB for those JS and application-first data architecture fans -
Add Servers For Higher Availability
For more reliability, replication is required for DB resilliancy. New Servers with DBs get setup, and Mongo Replica Sets get introduced (this is a "high-level" example here.):
- application-to-db communication go to the now "master" node
- the "master" node gets replicated to 2 "slave" instances, where the data is cloned
The Deployment Order And Instructions Are Critical
In Db replicas, the order of how this whole thing gets built really matters. This is a db-specific detail, not explicitly about K8s or Stateful sets.
DB With Replicas In K8s Require More Specific and Critical Config
The steps for the deployments, here, need to fit the db requirements.
Also, the "ephemeral" nature of kubernetes, without stateful sets, means that any pod + node can be destroyed and recreated without K8s really "caring" about the nodes + pods. This ephemeral nature goes against the goals of a db replication, because the replicated db instances are critical to the success of a high-availability db setup.
May Not Be Needed
"If an application doesn't require any stable identifiers or ordered deployment, deletion, or scaling, you should deploy your application using a workload object that provides a set of stateless replicas". Go to Deployments or ReplicaSets instead.
Consider Stateful Sets For Something Like DB Replicaion
Consider starting with a single server.
- install a db on the server and get it working
- to withstand failures, replica sets may get deployed onto several new servers
Build a few new servers.
- a master server dataset gets replicated to other "slave" servers
- application workloads go to the master server
- the master db node "knows about" which slave node has the replica data on it
- the issue here is that in K8s land, without stateful sets, this is impossible due to the ephemeral nature of pods
Stateful Set K8s Deployment Specifics
With StatefuSets, pods are...
- created in a sequential order - master could be spun up first, then slave 1, then slave 2
- assigned indexes, 0-first, by the stateful set
- get reliable unique names (db-0, db-1, db-2, etc)
- these names can be relied on!!
- given a stable, unique dns record that any app can use to access the pod
Scaling can be helped here because new pods are cloned from previous instances.
Also, on pod termination, the latest pod is deleted first.
Definition file
This is similar to a deployment definition file.
# ss.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: db-set
labels:
app: db
spec:
replicas: 3
selector:
matchLabels:
app: db
template:
metadata:
labels:
app: db
spec:
containers:
- name: mongodb
image: mongodb:5
# unique to StatefulSets, not deployments
serviceName: mongodb-h
Storage in Stateful Sets
A unique detail in a db replica set is that writes only go to the master.
This adds some specifity to K8s:
- a service that exposes the statefulSet can't be "normal" - the service in regular deployments balances requests across nodes
- a "headless service" is needed here (see This other doc for more deets on headless services)
All pods share the same vol
Here, all pods in the stateful set will try to use the same volume:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: db-ss
labels:
app: db
spec:
replicas: 3
selector:
matchLabels:
app: db
# pod def template section
template:
metadata:
labels:
app: db
spec:
containers:
- name: mongodb
image: mongodb:5
volumeMounts:
- mountPath: /data-root-dir-i-forgot
name: db-vol
volumes:
- name: db-vol
persistentVolumeClaim:
claimName: db-vol-claim
Each Pod Gets Its Own PVC + PV
Here, a stateful set can deploy pods that each reference their own pvc, where each is bound to their own pv.
Here, what looks nearly identical to a pvc definition file gets added to the statefulset def file under spec.volumeClaimTemplates
. Note:
- stateful set creates the pods chronologically
- A Pvc is created for each pod
- A pvc is connected to a storageClass
- The storageClass provisions a vol on the storage provider, here google
- the storageClass creates a pv
- the storageClass binds the pv to the pvc
- those steps repeat for each pod in the replica set, in order
During Pod Failure
stateful sets dont delete pvcs during pod failure/recreation. Stateful sets maintain "stable storage" for pods
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: db-ss
labels:
app: db
spec:
replicas: 3
selector:
matchLabels:
app: db
# pod def template section
template:
metadata:
labels:
app: db
spec:
containers:
- name: mongodb
image: mongodb:5
volumeMounts:
- mountPath: /data-root-dir-i-forgot
name: db-vol
# like pvcs, but "templatized" for deployments
# 1 pvc for each pod will be created
volumeClaimTemplates:
- metadata:
name: db-vol
spec:
accessModes:
- ReadWriteOnce
storageClassName: google-storage
resources:
requests:
storage: 500Mi