Question

Cert Manager Can't Validate ACME over HTTP01 per DO Tutorial

After following, to the letter, the tutorial provided by DO hereon setting up NGINX Ingress & Cert Manager, I have been unable to verify my subdomains once moving from ACME staging to ACME prod.

When using the HTTP01 method, and inspecting the (failing) challenge, I observe that it experiences a connection time out when connecting to the IP address of the load balancer (represented by the nginx ingress) when attempting to check the /.well-known/ tokens. When accessing the well-known link in my browser, it resolves just fine. I waiting 48 hours for DNS propagation just in case, with no further luck. As far as I can tell, whatever pod the challenge is running from, isn’t able to “dial out” to then “dial back in” and hit the LB and therefore access & validate the tokens.

When switching to DNS01, connecting to CloudFlare using my api key, all works just fine for ACME stage (I even see the staging DNS records created in my CF console), but again, when switching to ACME prod, I get a different type of failure (No solvers found for challenge). I believe this DNS01 is a separate issue, which I’ve opened a ticket for on cert-manager’s GitHub repo, but I am adding it here in case they may be related to the HTTP01 issue described above, perhaps having something to do with the way DO has their managed Kubernetes service set up.

Any guidance is appreciated. I’m fairly new to the kube world, but up until now, thought I had a fairly decent grasp on how things were working.


Submit an answer

This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Sign In or Sign Up to Answer

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Want to learn more? Join the DigitalOcean Community!

Join our DigitalOcean community of over a million developers for free! Get help and share knowledge in Q&A, subscribe to topics of interest, and get courses and tools that will help you grow as a developer and scale your project or business.

If this is helpful: I’ve uninstalled cert-manager and am currently just manually issuing my letsencrypt certs in my cluster.

The hope is that this bug will get fixed eventually and then I’ll reinstall cert-manager and continue as before.

I’ve posted how I do this here: https://github.com/nabsul/k8s-letsencrypt

This comment has been deleted

There is a known issue in the kubernetes project(https://github.com/kubernetes/kubernetes/issues/66607)for connections from within the cluster accessing public urls to the cluster. When traffic that originates from a Kubernetes pod goes through the Load Balancer, the kube-proxy service intercepts the traffic and directs it to the service instead of letting it go out to the Load Balancer and back to the node. This can cause a variety of issues depending on the service that receives these requests. In DOKS this often results in: “timeout”, “bad header”, or SSL error messages.

Current options for workaround are the following:

Access DOKS services through their resolvable service names or by using the desired services clusterIP.

or using this service annotation on newer(1.14+) clusters:

https://github.com/digitalocean/digitalocean-cloud-controller-manager/blob/2b8677c5c9f5a32a1ebe7b92cbd2b9687fee6eaa/docs/controllers/services/annotations.md#servicebetakubernetesiodo-loadbalancer-hostname

https://github.com/digitalocean/digitalocean-cloud-controller-manager/blob/2b8677c5c9f5a32a1ebe7b92cbd2b9687fee6eaa/docs/controllers/services/examples/README.md#https-or-http2-load-balancer-with-hostname