Developer Center

How to Setup Failover for DigitalOcean Static Routes Operator

How to Setup Failover for DigitalOcean Static Routes Operator

Introduction

The main purpose of the Static Routes Operator is to offer greater flexibility and control over network traffic within your Kubernetes environment. It enables you to tailor the routing configuration to meet your application requirements and optimize network performance. It is deployed as a DaemonSet; hence, it will run on each node of your DigitalOcean Managed Kubernetes cluster.

In this tutorial, you will learn to manage the routing table of each worker node based on the CRD spec and set up a failing over gateway.

The main goal of this tutorial is to demonstrate how to manage the routing table of each worker node based on the CRD spec and Set up a failing over gateway.

Prerequisites

  • Working DigitalOcean Managed Kubernetes cluster you have access to.

  • Kubectl CLI installed on your local machine (configured to point to your DigitalOcean Managed Kubernetes cluster)

  • NAT GW Droplets (2 or above) configured and running as detailed here.

  • Create a system to detect failures in a gateway Droplet that fits the user’s needs and ensures clear and accurate detection with minimal false alarms. Use monitoring services like Prometheus or Nagios, set up health check endpoints on the Droplet, or alerting tools like Alertmanager for notifications. For this purpose, you can use a monitoring stack from our marketplace.

Note: Ensure your NAT Gateway Droplet is created in the same VPC as your Kubernetes cluster.

Below is the architectural Diagram:

Architecture Diagram

Deploying the Kubernetes Static Routes Operator

Deploy the latest release of the static routes operator to your DigitalOcean Managed Kubernetes cluster using kubectl:

kubectl apply -f https://raw.githubusercontent.com/digitalocean/k8s-staticroute-operator/main/releases/v1/k8s-staticroute-operator-v1.0.0.yaml

Note: You can check the latest version in the releases path from the k8s-staticroute-operator GitHub repo.

Check if Operator Pods are up and running

Let’s verify if the operator pods are up and running.

kubectl get staticroutes -o wide -n staticroutes

The output looks similar to the below:

Output
NAME AGE DESTINATIONS GATEWAY static-route-ifconfig.me 119s ["XX.XX.XX.XX"] XX.XX.XX.XX static-route-ipinfo.io 111s ["XX.XX.XX.XX"] XX.XX.XX.XX

Now let’s check the operator logs and no exceptions should be reported

kubectl logs -f ds/k8s-staticroute-operator -n static-routes

You should observe the following output:

Output
Found 2 pods, using pod/k8s-staticroute-operator-498vv [2023-05-15 14:12:32,282] kopf._core.reactor.r [DEBUG ] Starting Kopf 1.35.6. [2023-05-15 14:12:32,282] kopf._core.engines.a [INFO ] Initial authentication has been initiated. [2023-05-15 14:12:32,283] kopf.activities.auth [DEBUG ] Activity 'login_via_pykube' is invoked. [2023-05-15 14:12:32,285] kopf.activities.auth [DEBUG ] Pykube is configured in cluster with service account. [2023-05-15 14:12:32,286] kopf.activities.auth [INFO ] Activity 'login_via_pykube' succeeded. [2023-05-15 14:12:32,286] kopf.activities.auth [DEBUG ] Activity 'login_via_client' is invoked. [2023-05-15 14:12:32,287] kopf.activities.auth [DEBUG ] Client is configured in cluster with service account. [2023-05-15 14:12:32,288] kopf.activities.auth [INFO ] Activity 'login_via_client' succeeded. [2023-05-15 14:12:32,288] kopf._core.engines.a [INFO ] Initial authentication has finished. [2023-05-15 14:12:32,328] kopf._cogs.clients.w [DEBUG ] Starting the watch-stream for customresourcedefinitions.v1.apiextensions.k8s.io cluster-wide. [2023-05-15 14:12:32,330] kopf._cogs.clients.w [DEBUG ] Starting the watch-stream for staticroutes.v1.networking.digitalocean.com cluster-wide.

To mitigate the impact of gateway failures, it is advisable to have a standby gateway Droplet prepared for failover when required. Although true high availability (HA) is not supported by the operator at the moment, performing failover helps minimize the duration of service disruption.

Note: Considering all operator instances are up and running correctly at the time of the failover.

Suppose you have a designated destination IP address, 34.160.111.145, which represents the active or primary gateway, with an IP address of 10.116.0.4, responsible for transmitting traffic. This is stored in the primary.yaml file.

./primary.yaml
apiVersion: networking.digitalocean.com/v1
kind: StaticRoute
metadata:
  name: primary
spec:
  destinations:
    - "34.160.111.145"
  gateway: "10.116.0.4"

Additionally, you will have a standby or secondary gateway with an IP address of 10.116.0.12, ready to handle traffic for the same destination IP address. The StaticRoute definition in secondary.yaml is identical to the primary one, except for the gateway IP address (and object name). This is stored in the file secondary.yaml.

./secondary.yaml
apiVersion: networking.digitalocean.com/v1
kind: StaticRoute
metadata:
  name: secondary
spec:
  destinations:
    - "34.160.111.145"
  gateway: "10.116.0.12"

The actual failover procedure then consists of the following steps:

  • Identifying that the active gateway with IP address 10.116.0.5 is failing.
  • Delete the currently active StaticRoute.
  • Apply the standby StaticRoute.

Delete the Active StaticRoute

Now let’s delete the currently active StaticRoute.

kubectl delete -f primary.yaml

Wait 30 to 60 seconds to give each operator instance enough time to process the object deletion; that is, respond by removing the route from all nodes.

Apply the Standby StaticRoute

Let’s make the secondary StaticRoute active.

 kubectl apply -f secondary.yaml

The operator should pick up the new standby StaticRoute and enter the corresponding routing table entries. Afterward, the failover is completed.

Note: Please avoid modifying an existing StaticRoute by directly updating the gateway IP address using commands like kubectl edit staticroute primary to modify only the spec.gateway field. This operation is currently unsupported and may result in failures.

Testing the Setup

Each sample CRD creates a static route to two websites reporting your public IP - ifconfig.me/ip, and ipinfo.io/ip. A typical static route definition looks like the below:

apiVersion: networking.digitalocean.com/v1
kind: StaticRoute
metadata:
  name: static-route-ifconfig.me
spec:
  destinations:
    - "34.160.111.145"
  gateway: "10.116.0.5"

To test the setup, download a sample manifest from the example location:

Example for ifconfig.me & ipinfo.io-

curl -O https://raw.githubusercontent.com/digitalocean/k8s-staticroute-operator/main/examples/static-route-ifconfig.me.yaml
curl -O https://raw.githubusercontent.com/digitalocean/k8s-staticroute-operator/main/examples/static-route-ipinfo.io.yaml

After downloading the manifests, replace each manifest file’s <> placeholders. Then, apply each manifest using kubectl:

kubectl apply -f static-route-ifconfig.me.yaml
kubectl apply -f static-route-ipinfo.io.yaml

Finally, test if the curl-test pod replies to your NAT Gateway public IP for each route:

kubectl exec -it curl-test -- curl ifconfig.me/ip
kubectl exec -it curl-test -- curl ipinfo.io/ip

You would need to use the same test during the failover testing. During the primary gateway Droplet failure, the result should give NAT GW public IP of the primary Droplet and during the secondary gateway Droplet/failover. The result should give NAT Gateway’s public IP of the secondary Droplet.

Troubleshooting

  • You need to check the StaticRoute object: If an error occurs, first look for errors in the static route event for each node where the rule is applied.
kubectl get StaticRoute <static-route-name> -o yaml
  • Check logs: To dig deeper, you can check for errors in the static route operator logs.
kubectl logs -f ds/k8s-staticroute-operator -n static-routes

Clean up

To remove the operator and associated resources, please run the following kubectl command (make sure you’re using the same release version as in the install step):

kubectl delete -f deploy https://raw.githubusercontent.com/digitalocean/k8s-staticroute-operator/main/releases/v1/k8s-staticroute-operator-v1.0.0.yaml

Note: Above command will also delete the associated namespace (static-routes). Make sure to back up your CRDs first, if needed later.

The output looks similar to:

customresourcedefinition.apiextensions.k8s.io "staticroutes.networking.digitalocean.com" deleted
serviceaccount "k8s-staticroute-operator" deleted
clusterrole.rbac.authorization.k8s.io "k8s-staticroute-operator" deleted
clusterrolebinding.rbac.authorization.k8s.io "k8s-staticroute-operator" deleted
daemonset.apps "k8s-staticroute-operator" deleted

Now, if you test the same curl command, you will get the worker node IP as an output:

kubectl exec -it curl-test -- curl ifconfig.me/ip
kubectl exec -it curl-test -- curl ipinfo.io/ip

Now check the worker node’s public IP:

kubectl get nodes -o wide

Conclusion

Implementing failover capabilities, even if true high availability (HA) is not fully supported, is a recommended approach to minimize the impact of gateway failures.

Organizations can significantly reduce the duration of service disruptions by having a standby gateway ready for failover when needed.

It is important to prepare a standby gateway droplet and ensure a smooth transition when failing over. While the implementation may vary depending on specific requirements, prioritizing failover readiness can contribute to maintaining reliable and uninterrupted service delivery.

Next Steps

You can refer to our documentation to configure Droplet as a gateway.

Our official Managed Kubernetes product documentation provides more information on the DigitalOcean Managed Kubernetes and its features.

You can contact our sales team to migrate to DigitalOcean or talk to our Solution Engineers.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the authors
Default avatar

Solutions Engineer II


Default avatar

Sr Technical Writer

Sr. Technical Writer@ DigitalOcean | Medium Top Writers(AI & ChatGPT) | 2M+ monthly views & 34K Subscribers | Ex Cloud Consultant @ AMEX | Ex SRE(DevOps) @ NUTANIX


Still looking for an answer?

Ask a questionSearch for more help

Was this helpful?
 
Leave a comment


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Try DigitalOcean for free

Click below to sign up and get $200 of credit to try our products over 60 days!

Sign up

Join the Tech Talk
Success! Thank you! Please check your email for further details.

Please complete your information!

Featured on Community

Get our biweekly newsletter

Sign up for Infrastructure as a Newsletter.

Hollie's Hub for Good

Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.

Become a contributor

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

Welcome to the developer cloud

DigitalOcean makes it simple to launch in the cloud and scale up as you grow — whether you're running one virtual machine or ten thousand.

Learn more