Zero Trust Security for Microservices

“Zero Trust” has become one of the most overloaded terms in security. It gets applied to network products, identity platforms, and compliance frameworks, often with the underlying principle — never trust, always verify — buried under marketing. This post tries to be concrete: what does Zero Trust mean for a microservices architecture, and how do you actually implement it?

The Problem with Implicit Trust

In a traditional perimeter model, traffic inside the network is implicitly trusted. In a microservices architecture, this creates a large blast radius: if one service is compromised, the attacker can freely call any other service on the internal network.

This is the specific problem Zero Trust addresses. The goal is to eliminate implicit trust between services, requiring every communication to be authenticated, authorized, and (where sensitive) encrypted — regardless of where it originates.

The Three Pillars

1. Workload Identity

Every service needs a cryptographically verifiable identity. In Kubernetes, this is typically provided by a service mesh using SPIFFE (Secure Production Identity Framework for Everyone) / SPIRE.

Each workload gets a short-lived X.509 certificate encoding its identity (e.g., spiffe://cluster.local/ns/payments/sa/checkout). These certificates are automatically rotated and tied to the Kubernetes service account — not to a long-lived secret that can be stolen.

2. Mutual TLS (mTLS)

Once workloads have identities, mTLS ensures that both sides of every service-to-service connection verify each other’s certificate before any data is exchanged.

Without mTLS: Service A calls Service B. Service B has no way to verify that the caller is actually Service A.

With mTLS: Service A presents its SPIFFE certificate. Service B verifies it against the trust bundle. Only then does the connection proceed.

3. Authorization Policy

Authentication (verifying identity) is necessary but not sufficient. Authorization (deciding what an authenticated identity is allowed to do) completes the model.

In Istio, authorization policies define which services can talk to which, at the HTTP level:

apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
  name: allow-checkout-only
  namespace: payments
spec:
  selector:
    matchLabels:
      app: payment-service
  rules:
  - from:
    - source:
        principals: ["cluster.local/ns/shop/sa/checkout"]
    to:
    - operation:
        methods: ["POST"]
        paths: ["/v1/payments"]

This policy says: only the checkout service account in the shop namespace may call POST /v1/payments on the payment service. Everything else is denied.

Implementing with a Service Mesh

A service mesh is the practical implementation layer for Zero Trust in Kubernetes. The two main options are:

Istio

Istio is the most feature-rich option. It provides mTLS, authorization policies, traffic management, and observability. The learning curve is real, but it gives you fine-grained control.

# Enable strict mTLS for an entire namespace
apiVersion: security.istio.io/v1beta1
kind: PeerAuthentication
metadata:
  name: default
  namespace: payments
spec:
  mtls:
    mode: STRICT

With STRICT mode, any service that tries to communicate without a valid certificate is rejected — no exceptions.

Cilium (with Hubble)

Cilium operates at the kernel level using eBPF, providing network policy, identity-aware connectivity, and deep observability without a sidecar proxy. For teams already using Cilium for networking, enabling mTLS and identity-based policies is a natural extension.

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: allow-checkout-to-payments
spec:
  endpointSelector:
    matchLabels:
      app: payment-service
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: checkout
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP

Beyond Service-to-Service: The Full Picture

Service mesh handles east-west (internal) traffic. A complete Zero Trust microservices architecture also addresses:

North-south traffic (ingress): Authenticate all external requests at the gateway. Use short-lived tokens (JWT/OAuth2), validate at the edge, and propagate identity inward — don’t re-authenticate at each service.

Secrets access: Services should authenticate to the secrets store (Vault, cloud KMS) using their workload identity, not a stored credential. Dynamic secrets with short TTLs mean a compromised service can’t use stolen credentials for long.

Network segmentation: Even with mTLS, namespace-level network policies provide defense in depth. Services in the payments namespace shouldn’t be reachable from the logging namespace by default.

Observability: You Can’t Protect What You Can’t See

Zero Trust without observability is incomplete. A service mesh gives you:

mTLS status — which connections are encrypted, which aren’t
Identity telemetry — which service is calling which, with what frequency
Policy violations — denied connections logged with source and destination identity
Anomaly detection — baseline normal communication patterns, alert on deviations

Hubble (Cilium’s observability layer) and Kiali (Istio’s dashboard) both provide this. Export this telemetry to your SIEM.

A Practical Migration Path

You don’t have to enable strict mTLS everywhere on day one. A staged approach:

Permissive mode — mTLS where both sides support it, plaintext fallback. No disruption, start gathering visibility.
Audit — use the observability layer to identify all non-mTLS communication and address gaps.
Strict mode — enable STRICT mTLS namespace by namespace, starting with high-risk services.
Authorization policies — add fine-grained allow-lists once mTLS is stable.