Sharded MongoDB Cluster Setup on GKE -

Sharded MongoDB Cluster Setup on Google Kubernetes Engine

Deploying a Sharded MongoDB Cluster on Google Kubernetes Engine (GKE) ensures high availability and scalability for your applications. In this guide, we will use Kubernetes StatefulSets to manage MongoDB containers while maintaining persistent storage. Additionally, we will explore best practices for headless services, storage, and replica sets.

At the same time, ZippyOPS offers consulting, implementation, and managed services to help enterprises deploy and manage clusters across DevOps, DevSecOps, Cloud, Automated Ops, Microservices, Infrastructure, and Security environments. Learn more about our services and solutions.

Diagram of a Sharded MongoDB Cluster deployed on Google Kubernetes Engine with StatefulSets and PersistentVolumes.

Understanding Key Kubernetes Concepts for a Sharded MongoDB Cluster

Before setting up your cluster, it’s important to understand the Kubernetes components we will use.

StatefulSets in a Sharded MongoDB Cluster

A StatefulSet manages pods while guaranteeing their ordering and unique identities. Unlike Deployments, StatefulSets maintain persistent identities and storage for pods. This makes them ideal for stateful applications like MongoDB. If a pod is deleted or restarted, its data remains intact via PersistentVolumes.

StorageClass for a Sharded MongoDB Cluster

StorageClass defines different types of storage available in a Kubernetes cluster. Each StorageClass uses a provisioner, such as GCEPersistentDisk, to allocate storage dynamically. This ensures pods receive the correct type of persistent storage automatically.

PersistentVolume for MongoDB Data

A PersistentVolume (PV) represents a storage resource in the cluster. Pods claim PVs via PersistentVolumeClaims (PVCs), and these claims can persist beyond pod lifecycles. For a Sharded MongoDB Cluster, PVs ensure database files remain consistent and durable.

Headless Services in a Sharded MongoDB Cluster

Headless Services configure DNS for pods with matching labels but do not perform load balancing. This allows StatefulSet pods to have unique network identifiers. Consequently, MongoDB nodes can reliably communicate within the cluster.

For more technical reference, see Kubernetes StatefulSet documentation.

Prerequisites for a Sharded MongoDB Cluster on GKE

Ensure the following tools are installed on your Linux host before proceeding:

GCP Cloud SDK (gcloud)
Authentication to a GCP project
Kubernetes CLI (kubectl)
Configured Kubernetes credentials

Step 1: Create Namespace, StorageClass, and PersistentVolumes

Our Sharded MongoDB Cluster will consist of:

1 Config Server (StatefulSet)
2 Shards, each as a ReplicaSet with 1 replica (StatefulSet)
2 Mongos Routers (Deployment)

First, create a Kubernetes namespace:

apiVersion: v1
kind: Namespace
metadata:
  name: daemonsl

Apply the namespace:

kubectl apply -f namespace.yaml
kubectl get ns

Next, define a StorageClass for SSD disks:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: fast
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd

Create GCE SSD disks:

gcloud compute disks create --size 10GB --type pd-ssd pd-ssd-disk-10g-1
gcloud compute disks create --size 10GB --type pd-ssd pd-ssd-disk-10g-2
gcloud compute disks create --size 5GB --type pd-ssd pd-ssd-disk-5g-1

Define PersistentVolumes using templates for 10GB and 5GB disks. Apply them with kubectl.

Step 2: Deploy StatefulSets for a Sharded MongoDB Cluster

Config Server StatefulSet

Create a headless service and StatefulSet for the Config Server:

apiVersion: v1
kind: Service
metadata:
  name: mongodb-configdb-headless-service
  namespace: daemonsl
spec:
  ports:
    - port: 27019
      targetPort: 27019
  clusterIP: None
  selector:
    role: mongodb-configdb
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mongodb-configdb
  namespace: daemonsl
spec:
  serviceName: mongodb-configdb-headless-service
  replicas: 1
  template:
    metadata:
      labels:
        role: mongodb-configdb
        tier: configdb
    spec:
      containers:
        - name: mongodb-configdb-container
          image: mongo
          command: ["mongod","--port","27019","--dbpath","/mongo-disk","--bind_ip","0.0.0.0","--configsvr"]
          volumeMounts:
            - name: mongodb-configdb-persistent-storage-claim
              mountPath: /mongo-disk
  volumeClaimTemplates:
    - metadata:
        name: mongodb-configdb-persistent-storage-claim
      spec:
        accessModes: ["ReadWriteOnce"]
        resources:
          requests:
            storage: 5Gi
        storageClassName: fast

Apply the configuration with kubectl.

MainDB Shards StatefulSets

Create StatefulSets for shard1 and shard2, each with 10GB storage. Use headless services for DNS and VolumeClaimTemplates to ensure persistence.

Step 3: Deploy Mongos Routers for a Sharded MongoDB Cluster

Mongos routers are stateless and connect config servers with shards:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mongos
  namespace: daemonsl
spec:
  replicas: 2
  template:
    metadata:
      labels:
        role: mongos
        tier: routers
    spec:
      containers:
        - name: mongos-container
          image: mongo
          command: ["mongos","--port","27017","--bind_ip","0.0.0.0","--configdb","mongodb-configdb-0.mongodb-configdb-headless-service.daemonsl.svc.cluster.local:27019"]

Apply the deployment via kubectl.

Step 4: Configure Sharding in a Sharded MongoDB Cluster

Initiate the replica sets for each shard:

kubectl exec mongodb-shard1-0 -c mongodb-shard1-container -- mongo --eval 'rs.initiate({_id: "Shard1", members: [{_id: 0, host: "mongodb-shard1-0.mongodb-shard1-headless-service.daemonsl.svc.cluster.local:27017"}]})'

kubectl exec mongodb-shard2-0 -c mongodb-shard2-container -- mongo --eval 'rs.initiate({_id: "Shard2", members: [{_id: 0, host: "mongodb-shard2-0.mongodb-shard2-headless-service.daemonsl.svc.cluster.local:27017"}]})'

Add shards to the Mongos routers:

kubectl exec <mongos-pod> -c mongos-container -- mongo --eval 'sh.addShard("Shard1/mongodb-shard1-0.mongodb-shard1-headless-service.daemonsl.svc.cluster.local:27017")'
kubectl exec <mongos-pod> -c mongos-container -- mongo --eval 'sh.addShard("Shard2/mongodb-shard2-0.mongodb-shard2-headless-service.daemonsl.svc.cluster.local:27017")'

You can verify sharding status via Mongo shell. This setup can scale to any number of shards, ensuring stateful applications run efficiently on GKE.

Step 5: Test and Clean Up a Sharded MongoDB Cluster

Enable sharding for a database and validate data distribution:

sh.enableSharding("mydb")
sh.status()

To avoid unnecessary charges, clean up your Kubernetes environment and delete disks using a teardown script.

Key Takeaways for a Sharded MongoDB Cluster

Deploying a Sharded MongoDB Cluster on GKE requires StatefulSets, PersistentVolumes, and headless services.
ReplicaSets ensure high availability for each shard.
Mongos routers connect shards and config servers for efficient query routing.
ZippyOPS can streamline cluster deployment with consulting, implementation, and managed services across DevOps, MLOps, AIOps, Cloud, Microservices, Infrastructure, and Security. Learn more about our products.
Explore demos on our YouTube channel.

For professional guidance and cluster management, contact us at sales@zippyops.com.