Cert Manager Can't Validate ACME over HTTP01 per DO Tutorial

Posted January 6, 2020 2.4k views
DigitalOcean Managed Kubernetes

After following, to the letter, the tutorial provided by DO hereon setting up NGINX Ingress & Cert Manager, I have been unable to verify my subdomains once moving from ACME staging to ACME prod.

When using the HTTP01 method, and inspecting the (failing) challenge, I observe that it experiences a connection time out when connecting to the IP address of the load balancer (represented by the nginx ingress) when attempting to check the /.well-known/ tokens. When accessing the well-known link in my browser, it resolves just fine. I waiting 48 hours for DNS propagation just in case, with no further luck. As far as I can tell, whatever pod the challenge is running from, isn’t able to “dial out” to then “dial back in” and hit the LB and therefore access & validate the tokens.

When switching to DNS01, connecting to CloudFlare using my api key, all works just fine for ACME stage (I even see the staging DNS records created in my CF console), but again, when switching to ACME prod, I get a different type of failure (No solvers found for challenge). I believe this DNS01 is a separate issue, which I’ve opened a ticket for on cert-manager’s GitHub repo, but I am adding it here in case they may be related to the HTTP01 issue described above, perhaps having something to do with the way DO has their managed Kubernetes service set up.

Any guidance is appreciated. I’m fairly new to the kube world, but up until now, thought I had a fairly decent grasp on how things were working.

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Submit an Answer
3 answers

There is a known issue in the kubernetes project( connections from within the cluster accessing public urls to the cluster. When traffic that originates from a Kubernetes pod goes through the Load Balancer, the kube-proxy service intercepts the traffic and directs it to the service instead of letting it go out to the Load Balancer and back to the node. This can cause a variety of issues depending on the service that receives these requests. In DOKS this often results in: “timeout”, “bad header”, or SSL error messages.

Current options for workaround are the following:

Access DOKS services through their resolvable service names or by using the desired services clusterIP.

or using this service annotation on newer(1.14+) clusters:

  • So if I’m trying to use cert-manager to validate two domains, what would be my value for Is using just one of those two sufficient, or would the latter domain (not specified as the LB hostname) fail?

    Keeping consistent with the tutorial mentioned in the OP, the domains are & I’m assuming I can’t have the LB’s hostname bind to multiple domains, can I?

If this is helpful: I’ve uninstalled cert-manager and am currently just manually issuing my letsencrypt certs in my cluster.

The hope is that this bug will get fixed eventually and then I’ll reinstall cert-manager and continue as before.

I’ve posted how I do this here: