Skip to content

Security: yashyaadav/EFK_setup

Security

docs/security.md

Security model

Threat model

This setup protects against an attacker who lands a pod on the cluster and tries to read or write logs they shouldn't, sniff traffic between EFK components, or pivot from a compromised application pod into Elasticsearch. NetworkPolicies isolate the EFK pods, TLS prevents passive sniffing of log content in flight, and authentication ensures arbitrary cluster workloads can't write or query data without the elastic credential.

It does not protect against a compromised Kubernetes control plane, a compromised Fluentd ServiceAccount token (which gets cluster-wide read on pods and namespaces — that's the metadata Fluentd needs), or an attacker with the elastic-credentials Secret. Those credentials are the same blast radius as ES itself.

It also does not ship audit logs anywhere durable. Kubernetes API audit logs and ES audit logs are both off; turning them on and capturing them to a separate index is a sensible next step.

TLS architecture

A locally-generated, self-signed CA signs a single node certificate used by all Elasticsearch pods. The full PKCS12 keystore (elastic-certificates.p12) ships to ES via the es-tls Secret; just the CA (ca.crt) ships to Kibana and Fluentd via the es-ca Secret. Splitting the CA from the keystore is least-privilege — Kibana and Fluentd verify the server cert but never possess the private key.

Two TLS layers are enabled:

  • xpack.security.transport.ssl — node-to-node traffic on port 9300 (cluster join, replication, search coordination).
  • xpack.security.http.ssl — client traffic on port 9200 (Kibana queries, Fluentd writes, _cluster/health probes).

Both use verification_mode: certificate, which validates the cert chain back to the CA but doesn't require hostname matches — appropriate here because the same node cert is reused across all StatefulSet pods and is signed for all the DNS names the pods present (elasticsearch, elasticsearch.logging.svc.cluster.local, elastic-client..., localhost).

Cert rotation

7.14 does not hot-reload TLS keystores. To rotate:

  1. rm -rf certs/ && make tls-certs
  2. make secrets (re-creates es-tls and es-ca with the new material)
  3. kubectl -n logging rollout restart statefulset/es-cluster deployment/kibana daemonset/fluentd

Rotation is manual and disruptive — this is one of the strongest reasons to move to ES 8.x or ECK, which manage the cert lifecycle. cert-manager + a custom Issuer would also work and is the path I'd take next.

Authentication

A single user — the built-in elastic superuser — is shared by Kibana, Fluentd, and any operator-level access. The password is generated by make secrets and stored only in the elastic-credentials Secret. ES reads ELASTIC_PASSWORD from that Secret at boot to set the superuser on first start. Kibana reads it as ELASTICSEARCH_PASSWORD; Fluentd reads it as FLUENT_ELASTICSEARCH_PASSWORD.

This is a known limitation. The proper version splits responsibilities:

  • kibana_system (built-in, scoped role) for Kibana → ES.
  • A custom fluentd_writer role (cluster:monitor + indices:write on logstash-*) and user for Fluentd.

A one-shot Job that runs after ES is healthy could POST /_security/role/... and POST /_security/user/... to create both. That Job was scoped out of this version for simplicity; the README "Known limitations" section calls it out as future work.

RBAC

The only ClusterRole in this stack is for Fluentd:

rules:
- apiGroups: [""]
  resources: [pods, namespaces]
  verbs: [get, list, watch]

That's the minimum required for the kubernetes_metadata_filter plugin to enrich log records with pod and namespace metadata. make smoke step 7 verifies that the Fluentd ServiceAccount cannot delete pods — a positive RBAC scope test that's worth showing reviewers.

ES, Kibana, and Fluentd all run with automountServiceAccountToken left at default. Fluentd's token is what gets used by the metadata filter; ES and Kibana don't need it but currently get one. Setting automountServiceAccountToken: false on the ES + Kibana ServiceAccounts is a small additional hardening worth doing.

PodSecurity Admission

The logging namespace is labeled:

pod-security.kubernetes.io/enforce: privileged
pod-security.kubernetes.io/audit:   restricted
pod-security.kubernetes.io/warn:    restricted

The increase-vm-max-map init container sets vm.max_map_count=262144 via sysctl -w and needs privileged: true to do so. Both baseline and restricted block privileged containers — only the privileged tier admits them. So enforce is privileged, but we set audit and warn to restricted so the full gap to the strictest tier shows up on every apply and in the API audit log: anyone applying the manifests sees a Warning: PodSecurity ... message listing every restricted-tier control the workload violates, even though the pod still gets admitted.

Why not just use restricted?

Two paths to a restricted-compliant namespace exist:

  1. Set vm.max_map_count at the node level. A separate DaemonSet in kube-system runs once per node with a privileged container that sets the sysctl, then ES doesn't need the init container at all. The ES namespace can then be restricted. Tradeoff: requires write access to kube-system (often blocked on hosted control planes), introduces a separate DaemonSet to maintain, and breaks if a node is added after the DaemonSet is removed.
  2. Use a node image where vm.max_map_count=262144 is the default. GKE COS, Bottlerocket, and most other modern node images already do this, in which case the init container is redundant. On those, you could drop the init container and switch to restricted. Hard to assume in a portable demo.

This setup picks the pragmatic middle ground: keep the privileged init container, set enforce: privileged (the only tier that admits it), and surface the gap to restricted via audit + warn.

NetworkPolicy matrix

Source Destination Port Direction Allowed by
Kibana ES 9200/TCP egress (kibana NP) + ingress (es NP) both policies
Fluentd ES 9200/TCP egress (fluentd NP) + ingress (es NP) both policies
ES ES 9300/TCP ingress (es NP) ES NP only
anywhere Kibana 5601/TCP ingress (kibana NP) kibana NP — toggleable to ingress-ns
Kibana kube-dns 53/UDP+TCP egress (kibana NP) kibana NP
Fluentd kube-dns 53/UDP+TCP egress (fluentd NP) fluentd NP
App pods ES 9200/TCP denied (only Kibana + Fluentd are allowed)

The Kibana ingress rule is wide-open (from: [{}]) because Kibana is the user-facing surface. To restrict it to an in-cluster ingress controller, replace the rule with:

- from:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: ingress-nginx

Known weaknesses

  • Fluentd image (v1.4.2) is from 2019. A drop-in bump to fluent/fluentd-kubernetes-daemonset:v1.14-debian-elasticsearch7-1.0 preserves ES 7.x compatibility and brings newer Ruby/OpenSSL. Held back here to honor the pinned-version constraint.
  • Elasticsearch 7.14 reached EOL in 2022. ES 8.x is the supported line; ECK is the operator-based path. Both are out of scope for this portfolio version but called out in the README.
  • Single elastic superuser instead of per-component scoped users. Documented above.
  • No cert auto-rotation. Manual procedure documented; cert-manager or ECK would solve it.
  • No audit logging. ES audit logs and k8s API audit logs are both off.
  • Fluentd runs as uid 0 to read host log paths. Mitigated by capabilities.drop: [ALL] + only DAC_READ_SEARCH, which is the standard pattern for collectors.

Cloud secret management

In production you'd want passwords and the PKCS12 keystore to come from a cloud secret manager rather than make secrets. External Secrets Operator is the conventional bridge. The shape of the integration:

# Not applied here — illustrative.
apiVersion: external-secrets.io/v1
kind: ExternalSecret
metadata:
  name: elastic-credentials
  namespace: logging
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: aws-secrets-manager   # or gcp-sm, azure-kv
    kind: ClusterSecretStore
  target:
    name: elastic-credentials   # what the workloads reference today
  data:
    - secretKey: password
      remoteRef:
        key: prod/efk/elastic
        property: password
    - secretKey: xpack_encryptionkey
      remoteRef:
        key: prod/efk/elastic
        property: xpack_encryptionkey

The workload manifests don't change — they keep referencing secretKeyRef: { name: elastic-credentials, key: password }. The Secret is just managed by the operator instead of make secrets.

There aren't any published security advisories