nanos
By:
nanos

Droplet has lost outbound connectivity

March 8, 2017 516 views
Networking Ubuntu 16.04

So, today unattended-upgraded asked me for a reboot. No big deal I thought, and rebooted.

After rebooting the droplet it can no longer connect to the internet, although i can connect to it from the internet. (e.g. via SSH or HTTP)

Here is what I tried so far (I'm going to obfuscate the last octet of my IP addresses):

$ ifconfig
eth0      Link encap:Ethernet  HWaddr 3e:0a:12:4f:9d:64
          inet addr:46.101.39.xx  Bcast:46.101.63.yy  Mask:255.255.192.0
          inet6 addr: fe80::3c0a:12ff:fe4f:9d64/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:7765 errors:0 dropped:0 overruns:0 frame:0
          TX packets:12573 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:856215 (856.2 KB)  TX bytes:4933667 (4.9 MB)

eth1      Link encap:Ethernet  HWaddr aa:61:fa:a4:81:77
          inet addr:169.254.72.zz  Bcast:169.254.255.255  Mask:255.255.0.0
          inet6 addr: fe80::759e:acbc:907d:ef91/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:148 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 B)  TX bytes:46798 (46.7 KB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:3091 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3091 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:257445 (257.4 KB)  TX bytes:257445 (257.4 KB)

$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         46.101.0.1      0.0.0.0         UG    202    0        0 eth0
10.16.0.0       0.0.0.0         255.255.0.0     U     0      0        0 eth0
46.101.0.0      0.0.0.0         255.255.192.0   U     202    0        0 eth0
169.254.0.0     0.0.0.0         255.255.0.0     U     203    0        0 eth1

$ ping -c3 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.

--- 8.8.8.8 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2015ms

$ ping -c3 46.101.0.1
PING 46.101.0.1 (46.101.0.1) 56(84) bytes of data.

--- 46.101.0.1 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2015ms

And here are a few observations:

  • I have no idea at all what the eth1 interface is. It's not in /etc/network/interfaces, and I can't remember having seen it before (but then I never had the need to do network debugging on this droplet before).
  • I cannot bring eth1 down using ifdown (Unknown interface eth1) though using ifconfig down eth1 works (and doesn't make any difference).
  • I cannot even ping my default gateway.
  • I use ufw firewall, but it's allowing all outbound connections. tail -F /var/log/syslog while doing pings doesn't show anything being logged (logging is enabled for ufw).

I'm getting desparate here. Does anyone have any suggestions?

2 Answers
jtittle1 March 8, 2017
Accepted Answer

@nanos

To make sure the firewall isn't the cause, can you run:

sudo ufw disable

Then try to connect again. If it so happens that ufw was the actual cause, then it's most likely some sort of misconfiguration that we can easily resolve by flushing the current rules and then setting new rules up.

To do that, we'd first run:

sudo ufw reset

Then setup our new rules (as that just flushed all the old).

sudo ufw default deny incoming
sudo ufw default allow outgoing

With the basic incoming/outgoing rules set, we now need to define the ports we will allow connection on. In this example, I'll use 22 (SSH), 80 (HTTP) and 443 (HTTPS).

You can add any other ports that you need to the list.

sudo ufw allow 22/tcp
sudo ufw allow 80/tcp
sudo ufw allow 443/tcp

Finally, we'll re-enable ufw and confirm that we want to enable it.

sudo ufw enable
  • thanks. Yes, actually did disable ufw for a test and was still having the same isssues. (I actually meant to mention that as the last point, but forgot...)

    • @nanos

      Slightly odd, though for the sake of ensuring there are no other iptables rules that are in fact causing the block, can you try iptables -F or iptables --flush. That should delete all rules that may be in place and blocking. Since ufw overlays iptables, perhaps that will work or at the very least, rule out the firewall.

      What does seem odd is that you seem to have two public IP addresses, at least from the output of ifconfig. You're showing an IP with 46.101 and 169.254, neither of which are private IP address ranges (they'd be 10.x).

      Which of those is the actual IPv4 of the Droplet?

      • Do you know of a good and reliable way of backing up /restoring iptables before flushing? I'd be fairly confident I can restore ufw manually, but iptables - not so much...

        I fully agree with your thoughts on the two public IPs. The 46.xx one is the actual public IP. As I said: I have no idea were the other one is coming from ...

        • @nanos

          Regarding the two IP's, honestly, I would submit a support ticket through the control panel and see if support can look in to that for you.

          It's not uncommon to see a entries such as eth0, eth0:2, eth1 (Private Network, in my case), and then lo for the local loopback, though I've not seen two public IPv4 IP's assigned to a single droplet -- even where Floating IP's are in the mix -- as they don't get physically assigned to the Droplet (in terms of showing up when running ifconfig).

          As far as backing up iptables rules, to save them:

          iptables-save > rules.txt
          

          or

          iptables-save > /path/to/rules.txt
          

          Then to restore, you'd use:

          iptables-restore < rules.txt
          

          or

          iptables-restore < /path/to/rules.txt
          
          • OK. Thanks a lot. It's getting a bit late now, and I don't fancy a long night, so I'll pick this up tomorrow morning. Thanks for your help so far, and I'll definitely report back.

          • Thanks again for your reply. I tried that and it still doesn't work. I also opened a ticket with DO (both for clarification of eth1 and the networking issue) and after a bit of debugging they have given up:

            "Thank you for your reply. The eth1 interface will normally be the Droplet's private network interface, while eth0 is assigned to the public interface. I do apologize, at this time I do not have a recommended solution to this networking issue. Do you have any snapshots this Droplet was created from in known working order? "

            As it stands I will likely just restore a few days old backup.

            edit: "Given up" sounds harsher than I mean it: I'm not blaming them - unless its a hardware issue (unlikely given that it only affects outgoing connections) - it's not their business to troubleshoot my configuration.

          • OK, I think I have found out a root cause, but I'm lost for a fix.

            Restoring the last snapshot still found the same issue on the new server.

            More or less by coincidence with a lot of googleing I found the following postrouting route in my IP tables:

            
            $ sudo iptables -t nat -v -L POSTROUTING -n --line-number
            Chain POSTROUTING (policy ACCEPT 5 packets, 348 bytes)
            num   pkts bytes target     prot opt in     out     source               destination
            1      124  8956 SNAT       all  --  *      eth+    0.0.0.0/0            0.0.0.0/0            to:67.207.71.xxx
            
            

            (67.207.71.xxx is my floating IP)

            I can remove this by doing iptables -t nat -D POSTROUTING 1 and suddenly outbound traffic works again.

            I am now trying to make this persist by doing iptables-save > /etc/iptables/rules.v4 (yes, I have installed iptables-persistent for this purpose) but after a reboot the rule is back. I have no idea why (I checked, and it's not in /etc/iptables/rules.v4)

            Any thoughts on were this rule could be coming from?

            [Edit: I should mention I did check /etc/ufw/before.rules and it's not in there]

            [Edit again: I finally remember that I put that into the /etc/rc.local file .... Talk about stupidity ...]

          • so, to recap:

            1. eth1 is the local network interface. I still haven't got an explanation for the IP address, but I'll leave that for now.
            2. This was caused by a POSTROUTING rule in ipsec.
            3. It was reset on boot, because I had inserted that directive into /etc/rc.local

            A HUGE thank you for your help and pointers, @jtittle! Wouldn't have managed without you!!!

@nanos

No problem, glad I was able to somewhat help a bit, though the final resolution was all you :-).

One thing to note, for future reference, is that snapshots are full-state backups. This means they take a snapshot of the state of the machine as it is when you run the action. When you restore a backup, it will restore state as it was when the snapshot was taken, which is why restoring a snapshot will not work when an issue like this arises.

Think of a snapshot as an image (such as an ISO). It creates an image of the entire machine, so when it comes to networking, that'll come along with it. One of the IP's may change (the main) if it's restored to a Droplet with a different IP, but any other networking that may be in place will still remain.

For that very reason, I normally rely on on-server backups, block storage (to transfer the backups to), and other means of backup. In some cases, it's simply better to start from scratch. It can be a pain, but that's one reason I've started creating bash scripts to automate these things a long time ago.

Have another answer? Share your knowledge.