By vanpyrzericj
After following, to the letter, the tutorial provided by DO hereon setting up NGINX Ingress & Cert Manager, I have been unable to verify my subdomains once moving from ACME staging to ACME prod.
When using the HTTP01 method, and inspecting the (failing) challenge, I observe that it experiences a connection time out when connecting to the IP address of the load balancer (represented by the nginx ingress) when attempting to check the /.well-known/ tokens. When accessing the well-known link in my browser, it resolves just fine. I waiting 48 hours for DNS propagation just in case, with no further luck. As far as I can tell, whatever pod the challenge is running from, isn’t able to “dial out” to then “dial back in” and hit the LB and therefore access & validate the tokens.
When switching to DNS01, connecting to CloudFlare using my api key, all works just fine for ACME stage (I even see the staging DNS records created in my CF console), but again, when switching to ACME prod, I get a different type of failure (No solvers found for challenge). I believe this DNS01 is a separate issue, which I’ve opened a ticket for on cert-manager’s GitHub repo, but I am adding it here in case they may be related to the HTTP01 issue described above, perhaps having something to do with the way DO has their managed Kubernetes service set up.
Any guidance is appreciated. I’m fairly new to the kube world, but up until now, thought I had a fairly decent grasp on how things were working.
This textbox defaults to using Markdown to format your answer.
You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!
There is a known issue in the kubernetes project(https://github.com/kubernetes/kubernetes/issues/66607)for connections from within the cluster accessing public urls to the cluster. When traffic that originates from a Kubernetes pod goes through the Load Balancer, the kube-proxy service intercepts the traffic and directs it to the service instead of letting it go out to the Load Balancer and back to the node. This can cause a variety of issues depending on the service that receives these requests. In DOKS this often results in: “timeout”, “bad header”, or SSL error messages.
Current options for workaround are the following:
Access DOKS services through their resolvable service names or by using the desired services clusterIP.
or using this service annotation on newer(1.14+) clusters:
This comment has been deleted
If this is helpful: I’ve uninstalled cert-manager and am currently just manually issuing my letsencrypt certs in my cluster.
The hope is that this bug will get fixed eventually and then I’ll reinstall cert-manager and continue as before.
I’ve posted how I do this here: https://github.com/nabsul/k8s-letsencrypt
Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.
Full documentation for every DigitalOcean product.
The Wave has everything you need to know about building a business, from raising funding to marketing your product.
Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.
New accounts only. By submitting your email you agree to our Privacy Policy
Scale up as you grow — whether you're running one virtual machine or ten thousand.
Sign up and get $200 in credit for your first 60 days with DigitalOcean.*
*This promotional offer applies to new accounts only.