LAB -

Deploying OpenEBS Mayastor on Talos Linux: A Production Guide

Robert Rotter Apr 20, 2026

Deploying high-performance stateful workloads in Kubernetes requires a storage layer that doesn't bottleneck on the kernel. OpenEBS Mayastor, leveraging NVMe over Fabrics (NVMe-oF), bypasses traditional storage overhead to deliver near bare-metal IOPS directly to pods.

However, pairing Mayastor with Talos Linux introduces specific architectural friction. Talos is built on a secure, immutable, API-driven foundation that restricts hardware and kernel access. Mayastor requires low-level privileges to bypass the kernel and manage NVMe devices directly.

This guide documents the exact configurations, deployment manifests, and production realities required to bridge that gap and build a resilient, high-performance storage backend.

Cluster Architecture & Prerequisites

  • Distribution: Talos Linux
  • Storage Nodes: 6 nodes (talos86-92, excluding talos91)
  • Backend: NVMe SSDs (Dedicated per node)
  • Replication Factor: 3
  • Protocol: NVMe-oF (TCP)

Hardware Requirements

  • Minimum 3 nodes with physical NVMe drives.
  • 2+ CPU cores per storage node.
  • 2GiB hugepages (1024 x 2MiB pages) per storage node.
  • NVMe-TCP kernel modules (Built-in for Talos, requiring specific Helm overrides).

Core Talos Configuration

Talos requires specific machine configuration overrides to support Mayastor's polling architecture and memory requirements.

Control Plane: Pod Security Exemptions

Talos enforces baseline Pod Security Standards (PSS) by default. Mayastor requires privileged access to manage NVMe devices and manipulate network namespaces. Create cp.yaml to exempt the OpenEBS namespace:

cluster:
  apiServer:
    admissionControl:
      - name: PodSecurity
        configuration:
          apiVersion: pod-security.admission.config.k8s.io/v1beta1
          kind: PodSecurityConfiguration
          exemptions:
            namespaces:
              - openebs

(Note: The OpenEBS namespace itself must also be labeled with pod-security.kubernetes.io/enforce: privileged. Our ArgoCD manifest below handles this automatically via managedNamespaceMetadata.)

Worker Nodes: Memory and Mount Propagation

Create wp.yaml for your designated storage nodes to allocate hugepages, set scheduling labels, and establish a shared mount for volume attachment.

machine:
  sysctls:
    vm.nr_hugepages: "1024"
  nodeLabels:
    openebs.io/engine: "mayastor"
  kubelet:
    extraMounts:
      - destination: /var/local
        type: bind
        source: /var/local
        options:
          - bind
          - rshared
          - rw

Architectural Note: The rshared option on the /var/local mount is strictly required. Without mount propagation, the kubelet cannot access the host paths for volume mounting, and containers will not see attached volumes.

Apply these configurations:

talosctl patch --mode=no-reboot machineconfig -n <control-plane-ip> --patch @cp.yaml
talosctl patch --mode=no-reboot machineconfig -n <worker-node-ip> --patch @wp.yaml

# Kubelet must be restarted to register hugepages
talosctl -n <node-ip> service kubelet restart

Infrastructure as Code: ArgoCD Deployment

Below is the ArgoCD Application manifest used to deploy OpenEBS. Note that the Helm repository changed in October 2024 to https://openebs.github.io/openebs.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: openebs
  namespace: argocd
spec:
  destination:
    namespace: openebs
    server: https://kubernetes.default.svc
  project: default
  source:
    chart: openebs
    helm:
      values: |-
        engines:
          local:
            lvm:
              enabled: false
            zfs:
              enabled: false
          replicated:
            mayastor:
              enabled: true

        localprovisioner:
          enableHostpathClass: false

        mayastor:
          localpv-provisioner:
            enabled: false
            hostpathClass:
              enabled: false

          io_engine:
            envcontext: "iova-mode=pa"
            logLevel: debug
            resources:
              limits:
                cpu: "10"
                hugepages2Mi: "2Gi"
                hugepages1Gi: '0'
              requests:
                hugepages2Mi: "2Gi"
                hugepages1Gi: '0'
            nodeSelector:
              openebs.io/engine: mayastor
              kubernetes.io/arch: amd64

          csi:
            node:
              initContainers:
                enabled: false

          crds:
            csi:
              volumeSnapshots:
                enabled: true

          etcd:
            replicaCount: 3
            auth:
              token:
                enabled: false
              rbac:
                create: false
                allowNoneAuthentication: true
          loki:
            enabled: false
          alloy:
            enabled: false

    repoURL: https://openebs.github.io/openebs
    targetRevision: '*'
  syncPolicy:
    managedNamespaceMetadata:
      labels:
        pod-security.kubernetes.io/audit: privileged
        pod-security.kubernetes.io/enforce: privileged
        pod-security.kubernetes.io/warn: privileged
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

Critical Overrides:

  • initContainers.enabled: false: Standard Mayastor init containers check for loadable kernel modules and will fail on Talos, which compiles NVMe modules directly into the kernel.
  • envcontext: "iova-mode=pa": Sets the IOVA mode for optimal physical address translation.

Storage Configuration

DiskPools

DiskPools define which physical disks Mayastor can use. They operate on a strict one-pool-per-node architecture.

Identify available NVMe disks and their persistent paths:

talosctl -n <node-ip> get disks
talosctl -n <node-ip> ls /dev/disk/by-id/

⚠️ CRITICAL WARNING: Never use dynamic kernel device names (/dev/nvme0n1 or /dev/sda) in DiskPool configurations. These names are dynamically assigned at boot. If a node reboots and the kernel enumerates the device differently, the DiskPool will fail and data integrity cannot be guaranteed. Always map the WWID shown in Talos to the /dev/disk/by-id/ persistent path.

apiVersion: openebs.io/v1beta3
kind: DiskPool
metadata:
  name: talos86-nvme-pool
  namespace: openebs
spec:
  node: talos86.example.com
  disks: ["/dev/disk/by-id/nvme-eui.002538b221b7373f"]
---
apiVersion: openebs.io/v1beta3
kind: DiskPool
metadata:
  name: talos87-nvme-pool
  namespace: openebs
spec:
  node: talos87.example.com
  disks: ["/dev/disk/by-id/nvme-eui.002538b911b54815"]
# (Repeat for nodes 88, 89, 90, 92)

StorageClass

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: mayastor-nvmf
provisioner: io.openebs.csi-mayastor
parameters:
  protocol: 'nvmf'
  repl: '3'
  ioTimeout: '60'
  local: 'true'
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: Immediate

Key Parameters Explained:

  • protocol: 'nvmf': Ensures traffic routes over NVMe-over-TCP rather than legacy iSCSI.
  • repl: '3': Enforces 3-way synchronous replication across different node pools.
  • local: 'true': Instructs Mayastor to prioritize reading from the local node's replica if one exists, drastically reducing read latency.

Application Example: Loki StatefulSet

When deploying StatefulSets via ArgoCD, Kubernetes mutates .spec.volumeClaimTemplates with default values, causing ArgoCD to report OutOfSync drift. We handle this via ignoreDifferences (using jqPathExpressions or alternatively jsonPointers: - /spec/volumeClaimTemplates/0) and ServerSideApply.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: loki
  namespace: argocd
spec:
  destination:
    namespace: loki
    server: https://kubernetes.default.svc
  project: default
  ignoreDifferences:
    - group: "apps"
      kind: StatefulSet
      name: loki
      namespace: loki
      jqPathExpressions:
        - ".spec.volumeClaimTemplates"
  source:
    chart: loki
    helm:
      values: |-
        deploymentMode: SingleBinary
        singleBinary:
          replicaCount: 1
          persistence:
            storageClass: "mayastor-nvmf"
        # ... (remaining Loki config)
    repoURL: https://grafana.github.io/helm-charts
    targetRevision: '*'
  syncPolicy:
    syncOptions:
    - CreateNamespace=true
    - ServerSideApply=true

Production Realities: Resolving Architectural Friction

In deployment, theory rarely survives contact with the cluster. Below are the precise technical resolutions to the edge cases encountered during this integration.

1. ETCD Pod Mounting Failures

If openebs-etcd-* pods fail with path /var/local/... doesn't exist, it means Talos kubelet lacks access to host paths. The solution is the /var/local extra mount with rshared configured in wp.yaml (detailed above).

2. Hugepages Allocation Failures

If io-engine pods hang with insufficient hugepages-2Mi:

  1. Verify Talos OS allocation: talosctl -n <node-ip> get /proc/meminfo | grep Huge
  2. Restart kubelet: talosctl -n <node-ip> service kubelet restart
  3. Verify Kubernetes visibility: kubectl get node <node-name> -o json | jq '.status.allocatable'

3. DiskPool Creation Failures

If a DiskPool shows a "Failed" status, the disk likely contains an existing partition table or filesystem. On Talos 1.9.0+, wipe the disk directly:

talosctl wipe disk nvme0n1 -n <node-ip>

4. Upgrade Networking Timeouts

Upgrading Talos from 1.6.x to 1.7.x can cause the OS to drop the metrics exporter's gRPC connection to the io-engine, causing a CrashLoopBackoff. Either upgrade with the preserve flag: talosctl -n <node-ip> upgrade --preserve (default behavior in 1.8+), or temporarily disable metrics in Helm prior to the upgrade.

5. CPU Allocation vs. SPDK Polling & CFS Throttling

If io-engine is assigned a limit of 10 CPU cores but only utilizes ~200% (2 cores), or if you are experiencing severe I/O latency spikes under load, the issue lies in Mayastor's SPDK architecture and Kubernetes CFS quotas.

SPDK uses a userspace polling architecture. It ignores Kubernetes CPU limits entirely for thread count, relying on the MAYASTOR_CPUS environment variable. However, because SPDK continuously polls for I/O, it rapidly exhausts its Kubernetes CFS quota and gets throttled by the kernel, injecting latency. Verify this by running:

kubectl exec -n openebs openebs-io-engine-xxxxx -- cat /sys/fs/cgroup/cpu/cpu.stat
# High 'nr_throttled' values confirm the issue.

The optimal engineering solution: On dedicated storage nodes, remove CPU limits entirely. Allow MAYASTOR_CPUS to bound SPDK usage natively while leaving requests for the Kubernetes scheduler:

mayastor:
  io_engine:
    resources:
      limits:
        # Omit CPU limit entirely to bypass CFS throttling
        hugepages2Mi: "2Gi"
      requests:
        cpu: "2"              # Used only for pod scheduling
        hugepages2Mi: "2Gi"
    env:
      MAYASTOR_CPUS: "0-5"    # SPDK natively bounds to 6 cores. Formats: "0-5" or "0,2,4"

Verification: You can ensure the SPDK reactor threads have successfully bound to your assigned cores by checking process affinity:

kubectl exec -n openebs openebs-io-engine-xxxxx -- ps -eLo psr,comm | grep reactor

6. The Mental Model: Mayastor vs. Ceph

Engineers accustomed to Ceph often attempt to create a single, cluster-wide DiskPool. Mayastor explicitly rejects this. Mayastor utilizes a One Pool Per Node architecture. Replication is managed exclusively by the Volume layer, not the underlying pool layer.

Capacity Planning Reality Check: Because Mayastor replicates at the volume level, a 100GB PVC with repl: 3 will pull 100GB from three separate node pools. If you have 3 nodes with 500GB drives each (1500GB total raw), your maximum usable capacity across the cluster is ~500GB. Think of DiskPools as "storage donations" from each node; the volume layer handles the replication tax.


Safe Migration and Maintenance Strategy

Migrating production stateful workloads is a high-risk operation. We execute this in strict phases.

Pre-Flight Validation (fio testing)

Before touching production data, validate the raw IOPS and stability:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mayastor-test
spec:
  storageClassName: mayastor-nvmf
  accessModes: [ReadWriteOnce]
  resources: { requests: { storage: 50Gi } }
---
apiVersion: v1
kind: Pod
metadata:
  name: fio-test
spec:
  containers:
  - name: fio
    image: ljishen/fio
    command:
      - fio
      - --name=randrw-test
      - --filename=/data/testfile
      - --size=40G
      - --direct=1
      - --rw=randrw
      - --bs=4k
      - --ioengine=libaio
      - --iodepth=16
      - --runtime=3600
      - --numjobs=4
      - --time_based
      - --group_reporting
    volumeMounts:
    - name: data
      mountPath: /data
  volumes:
  - name: data
    persistentVolumeClaim:
      claimName: mayastor-test

Phased Migration

  1. Low Risk (Weeks 1-2): Migrate stateless caches or easily backed-up UI configurations (Grafana, Prometheus, Alertmanager).
  2. Medium Risk (Month 2): Migrate internal tooling (Jenkins, Harbor).
  3. High Risk (Month 3+): Migrate transactional databases (PostgreSQL, MySQL, Kafka) only after proving absolute cluster stability. Require full offline backups (pg_dump) prior to scaling down the original workloads.

Executing the Copy: To safely move data between storage classes, scale your application to 0, then use an Alpine helper pod to synchronize the data from the old PVC to the newly provisioned Mayastor PVC:

kubectl run -n default data-copy --rm -i --image=alpine --overrides='
{
  "spec": {
    "containers": [{
      "name": "copy", 
      "image": "alpine", 
      "command": ["sh", "-c", "cp -av /old/* /new/ && sync"],
      "volumeMounts": [
        {"name": "old", "mountPath": "/old"}, 
        {"name": "new", "mountPath": "/new"}
      ]
    }],
    "volumes": [
      {"name": "old", "persistentVolumeClaim": {"claimName": "old-storage-pvc"}},
      {"name": "new", "persistentVolumeClaim": {"claimName": "mayastor-new-pvc"}}
    ],
    "restartPolicy": "Never"
  }
}'

Quick Rollback: Always retain the old PVC until the application has run stably on Mayastor for at least 48 hours. If the application fails to start or exhibits high latency, immediately revert to the old storage by patching the deployment:

kubectl patch deployment <app-name> --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/volumes/0/persistentVolumeClaim/claimName", "value": "old-storage-pvc"}]'

Node Maintenance and io-engine Rebuilds

Never arbitrarily kill an io-engine pod without checking cluster health. Killing a pod holding the only healthy replica of a volume will result in immediate data unavailability. (Note: If you have provisioned a volume with repl: 1, killing the io-engine pod on that node results in immediate, guaranteed downtime).

  1. Verify Health: Ensure no degraded volumes exist.

    kubectl mayastor get volumes -n openebs
  2. Evict Safely: Use the Mayastor plugin to cordon and drain. This instructs the control plane to synchronize replicas before allowing the node to shut down.

    kubectl mayastor cordon node <node-name>
    kubectl mayastor drain node <node-name>

    Warning: Never use standard kubectl drain on a Mayastor storage node. The standard Kubernetes drain is completely unaware of Mayastor's replication geometry and will blindly evict pods, potentially causing immediate data loss if multiple replicas are taken offline.

  3. Monitor Rebuilds: When the node returns, SPDK will automatically sync stale blocks. Wait for the volume status to shift from REBUILDING back to Online and 3/3 before proceeding to the next node.

    To view the actual synchronization percentage during a rebuild, inspect the volume's JSON output:

    kubectl mayastor get volume <volume-id> -o json | jq '.spec.target.rebuilding'
  4. Return to Service: Once the volume status returns to 3/3 and Online, allow Mayastor to place new volumes on the node again:

    kubectl mayastor uncordon node <node-name>

Verification Commands

Validate the deployment status using the following operational commands:

# Verify component health
kubectl get pods -n openebs

# Verify DiskPool capacity and state
kubectl get diskpools -n openebs

# Validate node topology and availability
kubectl mayastor get nodes

# Check live replica health mapping
kubectl mayastor get volume-replica-topologies

Conclusion

Deploying OpenEBS Mayastor on Talos Linux requires precise configuration, but the result is an exceptionally fast, highly resilient stateful backend that respects the security posture of the cluster. By implementing the node-level sysctls, managing SPDK CPU affinity natively, and adhering to strict maintenance protocols, this architecture handles enterprise workloads with predictability and speed.

At Deviqon Labs, we rely on deep technical precision to solve complex infrastructure challenges. Sharp Minds, Solid Code.

Contact our engineering team to discuss your Kubernetes architecture.

Subscribe to our newsletter

Rest assured we will not misuse your email