Cert Manager Can't Validate ACME over HTTP01 per DO Tutorial

Question

After following, to the letter, the tutorial provided by DO hereon setting up NGINX Ingress & Cert Manager, I have been unable to verify my subdomains once moving from ACME staging to ACME prod.

When using the HTTP01 method, and inspecting the (failing) challenge, I observe that it experiences a connection time out when connecting to the IP address of the load balancer (represented by the nginx ingress) when attempting to check the /.well-known/ tokens. When accessing the well-known link in my browser, it resolves just fine. I waiting 48 hours for DNS propagation just in case, with no further luck. As far as I can tell, whatever pod the challenge is running from, isn’t able to “dial out” to then “dial back in” and hit the LB and therefore access & validate the tokens.

When switching to DNS01, connecting to CloudFlare using my api key, all works just fine for ACME stage (I even see the staging DNS records created in my CF console), but again, when switching to ACME prod, I get a different type of failure (No solvers found for challenge). I believe this DNS01 is a separate issue, which I’ve opened a ticket for on cert-manager’s GitHub repo, but I am adding it here in case they may be related to the HTTP01 issue described above, perhaps having something to do with the way DO has their managed Kubernetes service set up.

Any guidance is appreciated. I’m fairly new to the kube world, but up until now, thought I had a fairly decent grasp on how things were working.

John Kwiatkoski · Answer

There is a known issue in the kubernetes project(https://github.com/kubernetes/kubernetes/issues/66607)for connections from within the cluster accessing public urls to the cluster. When traffic that originates from a Kubernetes pod goes through the Load Balancer, the kube-proxy service intercepts the traffic and directs it to the service instead of letting it go out to the Load Balancer and back to the node. This can cause a variety of issues depending on the service that receives these requests. In DOKS this often results in: “timeout”, “bad header”, or SSL error messages.

Current options for workaround are the following:

Access DOKS services through their resolvable service names or by using the desired services clusterIP.

or using this service annotation on newer(1.14+) clusters:

https://github.com/digitalocean/digitalocean-cloud-controller-manager/blob/2b8677c5c9f5a32a1ebe7b92cbd2b9687fee6eaa/docs/controllers/services/annotations.md#servicebetakubernetesiodo-loadbalancer-hostname

https://github.com/digitalocean/digitalocean-cloud-controller-manager/blob/2b8677c5c9f5a32a1ebe7b92cbd2b9687fee6eaa/docs/controllers/services/examples/README.md#https-or-http2-load-balancer-with-hostname