Question

Pods stuck in pending state despite abundance of resources

Posted May 15, 2021 213 views
DigitalOcean Managed Kubernetes

Reposting from here: https://stackoverflow.com/questions/67542665/kubernetes-pod-stuck-pending-but-lacks-events-that-tell-me-why

I have a simple alpine:node kubernetes pod attempting to start from a deployment on a cluster with a large surplus of resources on every node. It’s failing to move out of the pending status. When I run kubectl describe, I get no events that explain why this is happening. What are the next steps for debugging a problem like this?

Here are some commands:

kubectl get events

60m         Normal   SuccessfulCreate    replicaset/frontend-r0ktmgn9-dcc95dfd8    Created pod: frontend-r0ktmgn9-dcc95dfd8-8wn9j
36m         Normal   ScalingReplicaSet   deployment/frontend-r0ktmgn9              Scaled down replica set frontend-r0ktmgn9-6d57cb8698 to 0
36m         Normal   SuccessfulDelete    replicaset/frontend-r0ktmgn9-6d57cb8698   Deleted pod: frontend-r0ktmgn9-6d57cb8698-q52h8
36m         Normal   ScalingReplicaSet   deployment/frontend-r0ktmgn9              Scaled up replica set frontend-r0ktmgn9-58cd8f4c79 to 1
36m         Normal   SuccessfulCreate    replicaset/frontend-r0ktmgn9-58cd8f4c79   Created pod: frontend-r0ktmgn9-58cd8f4c79-fn5q4

kubectl describe po/frontend-r0ktmgn9-58cd8f4c79-fn5q4 (some parts redacted)

Name:           frontend-r0ktmgn9-58cd8f4c79-fn5q4
Namespace:      default
Priority:       0
Node:           <none>
Labels:         app=frontend
                pod-template-hash=58cd8f4c79
Annotations:    kubectl.kubernetes.io/restartedAt: 2021-05-14T20:02:11-05:00
Status:         Pending
IP:
IPs:            <none>
Controlled By:  ReplicaSet/frontend-r0ktmgn9-58cd8f4c79
Containers:
  frontend:
    Image:      [Redacted]
    Port:       3000/TCP
    Host Port:  0/TCP
    Environment: [Redacted]
    Mounts:                 <none>
Volumes:                    <none>
QoS Class:                  BestEffort
Node-Selectors:             <none>
Tolerations:                node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                            node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                     <none>

I use loft virtual clusters, so the above commands were run in a virtual cluster context, where this pod’s deployment is the only resource. When run from the main cluster itself:

kubectl describe nodes

Name:               autoscale-pool-01-8bwo1
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=g-8vcpu-32gb
                    beta.kubernetes.io/os=linux
                    doks.digitalocean.com/node-id=d7c71f70-35bd-4854-9527-28f56adfb4c4
                    doks.digitalocean.com/node-pool=autoscale-pool-01
                    doks.digitalocean.com/node-pool-id=c31388cc-29c8-4fb9-9c52-c309dba972d3
                    doks.digitalocean.com/version=1.20.2-do.0
                    failure-domain.beta.kubernetes.io/region=nyc1
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=autoscale-pool-01-8bwo1
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=g-8vcpu-32gb
                    region=nyc1
                    topology.kubernetes.io/region=nyc1
                    wireguard_capable=false
Annotations:        alpha.kubernetes.io/provided-node-ip: 10.116.0.3
                    csi.volume.kubernetes.io/nodeid: {"dobs.csi.digitalocean.com":"246129007"}
                    io.cilium.network.ipv4-cilium-host: 10.244.0.171
                    io.cilium.network.ipv4-health-ip: 10.244.0.198
                    io.cilium.network.ipv4-pod-cidr: 10.244.0.128/25
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Fri, 14 May 2021 19:56:44 -0500
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  autoscale-pool-01-8bwo1
  AcquireTime:     <unset>
  RenewTime:       Fri, 14 May 2021 21:33:44 -0500
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Fri, 14 May 2021 19:57:01 -0500   Fri, 14 May 2021 19:57:01 -0500   CiliumIsUp                   Cilium is running on this node
  MemoryPressure       False   Fri, 14 May 2021 21:30:33 -0500   Fri, 14 May 2021 19:56:44 -0500   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Fri, 14 May 2021 21:30:33 -0500   Fri, 14 May 2021 19:56:44 -0500   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Fri, 14 May 2021 21:30:33 -0500   Fri, 14 May 2021 19:56:44 -0500   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Fri, 14 May 2021 21:30:33 -0500   Fri, 14 May 2021 19:57:04 -0500   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  Hostname:    autoscale-pool-01-8bwo1
  InternalIP:  10.116.0.3
  ExternalIP:  134.122.31.92
Capacity:
  cpu:                8
  ephemeral-storage:  103176100Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             32941864Ki
  pods:               110
Allocatable:
  cpu:                8
  ephemeral-storage:  95087093603
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             29222Mi
  pods:               110
System Info:
  Machine ID:                 a98e294e721847469503cd531b9bc88e
  System UUID:                a98e294e-7218-4746-9503-cd531b9bc88e
  Boot ID:                    a16de75d-7532-441d-885a-de90fb2cb286
  Kernel Version:             4.19.0-11-amd64
  OS Image:                   Debian GNU/Linux 10 (buster)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.4.3
  Kubelet Version:            v1.20.2
  Kube-Proxy Version:         v1.20.2
ProviderID:                   digitalocean://246129007
Non-terminated Pods:          (28 in total) [Redacted]
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests          Limits
  --------           --------          ------
  cpu                2727m (34%)       3202m (40%)
  memory             9288341376 (30%)  3680Mi (12%)
  ephemeral-storage  0 (0%)            0 (0%)
  hugepages-1Gi      0 (0%)            0 (0%)
  hugepages-2Mi      0 (0%)            0 (0%)
Events:              <none>


Name:               autoscale-pool-02-8mly8
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=m-2vcpu-16gb
                    beta.kubernetes.io/os=linux
                    doks.digitalocean.com/node-id=eb0f7d72-d183-4953-af0c-36a88bc64921
                    doks.digitalocean.com/node-pool=autoscale-pool-02
                    doks.digitalocean.com/node-pool-id=18a37926-d208-4ab9-b17d-b3f9acb3ce0f
                    doks.digitalocean.com/version=1.20.2-do.0
                    failure-domain.beta.kubernetes.io/region=nyc1
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=autoscale-pool-02-8mly8
                    kubernetes.io/os=linux
                    node.kubernetes.io/instance-type=m-2vcpu-16gb
                    region=nyc1
                    topology.kubernetes.io/region=nyc1
                    wireguard_capable=true
Annotations:        alpha.kubernetes.io/provided-node-ip: 10.116.0.12
                    csi.volume.kubernetes.io/nodeid: {"dobs.csi.digitalocean.com":"237830322"}
                    io.cilium.network.ipv4-cilium-host: 10.244.3.115
                    io.cilium.network.ipv4-health-ip: 10.244.3.96
                    io.cilium.network.ipv4-pod-cidr: 10.244.3.0/25
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Sat, 20 Mar 2021 18:14:37 -0500
Taints:             <none>
Unschedulable:      false
Lease:
  HolderIdentity:  autoscale-pool-02-8mly8
  AcquireTime:     <unset>
  RenewTime:       Fri, 14 May 2021 21:33:44 -0500
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Tue, 06 Apr 2021 16:24:45 -0500   Tue, 06 Apr 2021 16:24:45 -0500   CiliumIsUp                   Cilium is running on this node
  MemoryPressure       False   Fri, 14 May 2021 21:33:35 -0500   Tue, 13 Apr 2021 18:40:21 -0500   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Fri, 14 May 2021 21:33:35 -0500   Wed, 05 May 2021 15:16:08 -0500   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure          False   Fri, 14 May 2021 21:33:35 -0500   Tue, 06 Apr 2021 16:24:40 -0500   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Fri, 14 May 2021 21:33:35 -0500   Tue, 06 Apr 2021 16:24:49 -0500   KubeletReady                 kubelet is posting ready status. AppArmor enabled
Addresses:
  Hostname:    autoscale-pool-02-8mly8
  InternalIP:  10.116.0.12
  ExternalIP:  157.230.208.24
Capacity:
  cpu:                2
  ephemeral-storage:  51570124Ki
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             16427892Ki
  pods:               110
Allocatable:
  cpu:                2
  ephemeral-storage:  47527026200
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  memory:             13862Mi
  pods:               110
System Info:
  Machine ID:                 7c8d577266284fa09f84afe03296abe8
  System UUID:                cf5f4cc0-17a8-4fae-b1ab-e0488675ae06
  Boot ID:                    6698c614-76a0-484c-bb23-11d540e0e6f3
  Kernel Version:             4.19.0-16-amd64
  OS Image:                   Debian GNU/Linux 10 (buster)
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.4.4
  Kubelet Version:            v1.20.5
  Kube-Proxy Version:         v1.20.5
ProviderID:                   digitalocean://237830322
Non-terminated Pods:          (73 in total) [Redacted]
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests      Limits
  --------           --------      ------
  cpu                1202m (60%)   202m (10%)
  memory             2135Mi (15%)  5170Mi (37%)
  ephemeral-storage  0 (0%)        0 (0%)
  hugepages-1Gi      0 (0%)        0 (0%)
  hugepages-2Mi      0 (0%)        0 (0%)
Events:              <none>

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

×
Submit an Answer
1 answer

Hi there,

I think that this could be due to the host port that you’ve defined in your template.

I could suggest removing the host port from the template and give it another try.

If this is not the case, feel free to share your yaml template here so that I could have another look.

Regards,
Bobby