Skip to content

Retry socket connection failures to Kubernetes service #26

Description

@jeremywadsack

We see this once in a while for a job that fails:

SocketError - Failed to open TCP connection to kubernetes.default.svc:443 (getaddrinfo: Name or service not known)

The issue is that kube-dns didn't respond with an address. This can happen when the CPU on the node is under extreme load.

As we are already retrying KubeClient::HttpError matching /Timed out/ we should add this pattern to the list that we retry.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions