DNS failure - - network or configuration issue?

February 4, 2019 2.8k views
DNS

DNS and gateway configuration - - outage?

I’ve got an Ubuntu 18.04 server droplet deployed my DNS resolution is toast. I can’t access my server over ssh. I can only access my server using the web-based browser terminal emulator provided by DigitalOcean so I can’t copy and paste output. I need to use screenshots.

Here is screenshot with apt showing a DNS issue: https://i.imgur.com/KPw0WVV.png
Here are the contents of systemd-resolved –status: https://i.imgur.com/j5lhfY3.png
Here is the output of ip address: https://i.imgur.com/0crGJIT.png
The only file inside the netplan directory is 50-cloud-init.yaml. The contents of this file can be found here: https://i.imgur.com/fUs4BzZ.png
$ ping 104.131.0.1 produces: “Destination Host Unreachable”
$ ip -4 route produces: 104.131.0.0/18 dev eth0 proto kernel scope link src 104.131.14.165 and 10.17.0.0/16 dev eth0 proto kernel scope link src 10.17.0.5
$ sudo ip -4 route add default via 104.131.14.1 dev eth0 produces `RTNETLINK answers: File exists

A DO user with a similar issue asked last year in March but no one replied: https://www.digitalocean.com/community/questions/temporary-failure-resolving-digitalocean-mirrors

Is this a failure on DigitalOcean network? If it is not an outage, what could be wrong with my configuration? What other information could I provide to better diagnose the issue?

Thanks,

3 Answers

Hey friend,

Sorry to hear about the trouble this is giving you. I see that Javi replied to a ticket on this. I also had a second look at it, as well as one of our engineers. I just wanted to make sure that we were all on the same page, and we all came to the conclusion that the reply sent to the ticket represents the correct path to begin troubleshooting.

It’s at least good for us to make sure there are no platform issues causing such things, and to be extra certain of that.

Jarland

  • I need to preface this post with an apology.

    I have some more information to share about the way my system is configured.

    I’m not a very advanced network sysadmin so I’ve been making changes to configuration files without really understanding very well what I am doing.

    Based on an answer I found involving a similar DNS resolution issue on StackExchange, a few nights ago I altered my /etc/network/interfaces file so it looks like this: https://i.imgur.com/ahA9kmH.png
    As you can see in that image, I commented out the default configuration lines and added my own. The active lines in operation became simply:

    nameserver 8.8.8.8
    nameserver 8.8.4.4
    dns-nameservers 8.8.8.8
    

    Yes, it is redundant and awkward. I suppose this could be the problem here?

    When I comment out those three lines restoring the defaults (also showing in the above screenshot) and reboot my droplet, I seem to get a different DNS error when trying to run apt. Here is what apt looks like now: https://i.imgur.com/ZPmJLfK.png

    So when I think I have solved the problem, I get nowhere.

    Javier through the email support ticket I opened replied promptly and suggested that the issue could be with the way my firewall is configured. Javier explains there are no conflicting cloud firewalls on the DO network. So to test it out, I disable ufw completely, I reboot and apt still encounters a DNS failure (same as above): https://i.imgur.com/ZPmJLfK.png

    Is there any other information I could provide to sort this out or provide further clarity?

    Thanks, Jarland and Javier for your support and patience so far as I work through this.

    • No worries, sometimes breaking things is a good thing to do. I’m still not convinced you’ve actively or intentionally broken anything though. Try “ping 8.8.8.8” and see if that fails. I think your whole networking is failing, and DNS failure is just a symptom of that. I don’t think changing resolv.conf broke this though. Were there any other files changed by chance?

I recall changing resolv.conf and playing with some other commands and cat’ing other configuration files. I don’t recall exactly what other changes I may or may not have made. I’ve sifted through my bash history and there are no timestamps so it’s hard for me to pinpoint exactly. Based on what I see, I can’t tell which other configurations I changed, if at all.

With my firewall turned off, pinging 8.8.8.8 seems to work. No packet loss. But pinging www.google.com yields: Temporary failure in name resolution. Another member of the Ubuntu community on FreeNode suggested I try: ip -4 route add default via 104.131.14.1 dev eth0. The resulting output is RTNETLINK answers: File exists. See here for the raw output of all of the above: https://i.imgur.com/cmlu7B9.png

Since pinging 8.8.8.8 works, what does that show or indicate from your perspective?

For what it’s worth, the only file present in /etc/netplan/ is 50-cloud-init.yaml. It’s contents can be found here: https://i.imgur.com/tfnzcTo.png

This isn’t much of a solution or answer. The purpose of this post is to document the situation after I had to nuke my droplet and start over.

For the better part of 2 weeks I had been going back and forth with tech support reps with DigitalOcean privately over email. We couldn’t figure it out.

To recap the issue, here are all the details.

When running apt, the original DNS issue is still present. See here: https://i.imgur.com/kHaFNHF.png

An escalations rep had me paint a more comprehensive picture of my networking stack. Here are the commands he suggested and their output:

$ cat /etc/netplan/50-cloud-init.yaml: https://i.imgur.com/jumdfze.png

$ ip addr: https://i.imgur.com/qLLXncr.png

$ ip route, uname -a, ls -l /lib/modules: https://imgur.com/HuG1EhN.png

$ cat /etc/udev/rules.d/70-persistent-net.rules: This file doesn’t exist. The only file found Inside /etc/udev/rules.d/ is a file named 99-digitalocean-automount.rules. The contents are 30 lines long. Here are two screenshots:
Lines 1-19: https://i.imgur.com/5ABdgcn.png
Lines 19-30: https://i.imgur.com/RqCiQJL.png

$ sudo iptables -nvL –line-numbers: I can’t take screenshots of this output because the output is some 180 lines long. If I had ssh access, this would be easy. But for now I am stuck with the web-based Droplet Console so the only way for me to convey information is with screenshots.

The Developer Support Escalations rep responded with:

All of that information looks fine as well, so again it’s a bit of a head-scratcher here. Can you try taking a snapshot of this Droplet and then creating a new Droplet from that snapshot - then get back to us if it has the same issues? I’ve added a bit of credit to your account to pay for this testing.

He also suggested that I take a snapshot. So with a new Droplet with this new snapshot, apt is still broken. See here: https://i.imgur.com/UvXxf21.png

I floated the idea of filing a bug report upstream. Another DO user encountered the same issue last year on 16.04: https://www.digitalocean.com/community/questions/temporary-failure-resolving-digitalocean-mirrors
Unfortunately this user never received an answer. I found users with similar issues going as far back as 14.04.

I’ve resolved to nuke my Droplet and start over. Lucky for me my LAMP stack is easy to set up so it’s only a minor inconvenience.

Have another answer? Share your knowledge.