Hello !

Here is a quick summary. I have a namespace “apps” with tree apps deployed with horizontal pod autoscalers. I have the metrics server running fine and top commands work just fine. For a test I tried to bump autoscaler min replicas and that produced the desired effect:

Warning FailedScheduling <unknown> default-scheduler 0/1 nodes are available: 1 Insufficient cpu.

I have pod disruption budgets, but with max unavailable of 1, and currently I have tree pods in pending state that can’t be scheduled. So I would expect CA to launch a new node. Here is the update from the configmap:

Cluster-wide:
      Health:      Healthy (ready=1 unready=0 notStarted=0 longNotStarted=0 registered=1 longUnregistered=0)
                   LastProbeTime:      2020-05-07 07:35:46.149337873 +0000 UTC m=+302096.048832692
                   LastTransitionTime: 2020-05-03 19:42:01.523276381 +0000 UTC m=+71.422771221
      ScaleUp:     NoActivity (ready=1 registered=1)
                   LastProbeTime:      2020-05-07 07:35:46.149337873 +0000 UTC m=+302096.048832692
                   LastTransitionTime: 2020-05-03 19:42:01.523276381 +0000 UTC m=+71.422771221
      ScaleDown:   NoCandidates (candidates=0)
                   LastProbeTime:      2020-05-07 05:46:12.682186809 +0000 UTC m=+295522.581681643
                   LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC

    NodeGroups:
      Name:        3bc1e609-708b-4cc3-a820-fe41079a702a
      Health:      Healthy (ready=1 unready=0 notStarted=0 longNotStarted=0 registered=1 longUnregistered=0 cloudProviderTarget=1 (minSize=1, maxSize=2))
                   LastProbeTime:      2020-05-07 07:35:46.149337873 +0000 UTC m=+302096.048832692
                   LastTransitionTime: 2020-05-07 05:55:15.403329346 +0000 UTC m=+296065.302824136
      ScaleUp:     NoActivity (ready=1 cloudProviderTarget=1)
                   LastProbeTime:      2020-05-07 07:35:46.149337873 +0000 UTC m=+302096.048832692
                   LastTransitionTime: 2020-05-07 05:55:15.403329346 +0000 UTC m=+296065.302824136
      ScaleDown:   NoCandidates (candidates=0)
                   LastProbeTime:      2020-05-07 05:46:12.682186809 +0000 UTC m=+295522.581681643
                   LastTransitionTime: 2020-05-07 05:46:12.682186809 +0000 UTC m=+295522.581681643

      Name:        3ade42aa-f164-469f-916e-76112164a22e
      Health:      Healthy (ready=0 unready=0 notStarted=0 longNotStarted=0 registered=0 longUnregistered=0 cloudProviderTarget=0 (minSize=0, maxSize=3))
                   LastProbeTime:      0001-01-01 00:00:00 +0000 UTC
                   LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC
      ScaleUp:     NoActivity (ready=0 cloudProviderTarget=0)
                   LastProbeTime:      0001-01-01 00:00:00 +0000 UTC
                   LastTransitionTime: 0001-01-01 00:00:00 +0000 UTC
      ScaleDown:   NoCandidates (candidates=0)
                   LastProbeTime:      2020-05-07 05:46:12.682186809 +0000 UTC m=+295522.581681643
                   LastTransitionTime: 2020-05-03 19:54:05.185322073 +0000 UTC m=+795.084816916

I added a second node pool with autoscaling just to be sure. But the pods don’t even seem to get an event from cluster-autoscaler. I would expect a TriggeredScaleUp event, or at least a failure from cluster-autoscaler, but it’s just completely silent. Any idea ?

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

×
2 answers

Sooo, after playing a bit with it. It seems that when I delete the second node pool it works as expected. I don’t think this should be expected and I believe it is a bug or a limitation that is not documented.

  • I’ve noticed the same behaviour this weekend.

    I have 2 pools, both of which have autoscaling enabled. With 2 pools, the cluster autoscaler is taking no action. Deleting the second pool entirely causes the autoscaling to begin to function against the default pool again.

    As well, in my situation, the second node group had a minimum size of “0”, at had 0 droplets. Changing the min size of the second node pool to > 0 also appeared to fix the issue. It seems that the autoscaler just breaks if any node pools have 0 droplets.

Good catch ! I was actually in the same case then. I also had a pool set with 0 min nodes. So that must be it then I guess.

Submit an Answer