I'm running the habana operator.
Immediately after deploying using helm (following https://docs.habana.ai/en/latest/Installation_Guide/Additional_Installation/Kubernetes_Installation/Kubernetes_Operator.html#intel-gaudi-operator-for-kubernetes)
the habana-ai-feature-discovery-ds pods go in a crashLoopBackoff with this error:
exec: "--nfd": executable file not found in $PATH: unknown
The reason is simple: ENTRYPOINT was not added here:
So the spec created by the operator is invalid https://github.com/HabanaAI/gaudi-base-operator/blob/ee4ba038b5ab7d8d3231f88a54f1017728392ef4/internal/controller/feature_discovery.go#L256-L274
I can confirm that after building with
FROM intel/gaudi-feature-discovery:1.23.0-695
ENTRYPOINT ["/hfd"]
CMD ["--nfd"]
the pod spins up succesfully (currently published at quay.io/dtrifiro/gaudi-feature-discovery:1.23.0-695)
I'm running the habana operator.
Immediately after deploying using
helm(following https://docs.habana.ai/en/latest/Installation_Guide/Additional_Installation/Kubernetes_Installation/Kubernetes_Operator.html#intel-gaudi-operator-for-kubernetes)the
habana-ai-feature-discovery-dspods go in acrashLoopBackoffwith this error:The reason is simple:
ENTRYPOINTwas not added here:gaudi-feature-discovery/Dockerfile
Line 53 in 36c4afe
So the spec created by the operator is invalid https://github.com/HabanaAI/gaudi-base-operator/blob/ee4ba038b5ab7d8d3231f88a54f1017728392ef4/internal/controller/feature_discovery.go#L256-L274
I can confirm that after building with
the pod spins up succesfully (currently published at
quay.io/dtrifiro/gaudi-feature-discovery:1.23.0-695)