CoreOS Setup: Error 503: fleet server unable to communicate with etcd

June 6, 2016 1.6k views
Getting Started Clustering Networking Configuration Management CoreOS

I'm trying to follow the DigitalOcean CoreOS tutorial, however I can't get past the "Verify Cluster" section of the tutorial. Specifically, when I type in fleetctl list-machines, I receive the following error:

$ fleetctl list-machines
Error retrieving list of active machines: googleapi: Error 503: fleet server unable to communicate with etcd

When I initially created the 3 droplets, I typed this user data verbatim into the web interface to create the droplets:

#cloud-config

coreos:
  etcd2:
    # generate a new token for each unique cluster from https://discovery.etcd.io/new:
    discovery: https://discovery.etcd.io/d8ea4388ae9d4d41818b88f49c8ed80c
    # multi-region deployments, multi-cloud deployments, and Droplets without
    # private networking need to use $public_ipv4:
    advertise-client-urls: http://$private_ipv4:2379,http://$private_ipv4:4001
    initial-advertise-peer-urls: http://$private_ipv4:2380
    # listen on the official ports 2379, 2380 and one legacy port 4001:
    listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
    listen-peer-urls: http://$private_ipv4:2380
  fleet:
    public-ip: $private_ipv4   # used for fleetctl ssh command
  units:
    - name: etcd2.service
      command: start
    - name: fleet.service
      command: start

It seems to match closely to the tutorial's configuration. I created 3 droplets on the nyc2 datacenter. The discovery url was generated with the size parameter set to 3.

I'm not sure what the issue is currently, but here are some of the statuses of the relevant services:

$ systemctl status fleet
● fleet.service - fleet daemon
   Loaded: loaded (/usr/lib64/systemd/system/fleet.service; disabled; vendor preset: disabled)
  Drop-In: /run/systemd/system/fleet.service.d
           └─20-cloudinit.conf
   Active: active (running) since Sun 2016-06-05 23:56:54 UTC; 1h 7min ago
 Main PID: 1144 (fleetd)
   Memory: 10.1M
      CPU: 199ms
   CGroup: /system.slice/fleet.service
           └─1144 /usr/bin/fleetd

Jun 05 23:56:54 coreos-512mb-nyc2-01 systemd[1]: Started fleet daemon.
Jun 05 23:56:54 coreos-512mb-nyc2-01 fleetd[1144]: INFO fleetd.go:64: Starting fleetd version 0.11.7
Jun 05 23:56:54 coreos-512mb-nyc2-01 fleetd[1144]: INFO fleetd.go:168: No provided or default config file found - proceeding without
Jun 05 23:56:54 coreos-512mb-nyc2-01 fleetd[1144]: INFO server.go:157: Establishing etcd connectivity
$ systemctl status etcd2
● etcd2.service - etcd2
   Loaded: loaded (/usr/lib64/systemd/system/etcd2.service; disabled; vendor preset: disabled)
  Drop-In: /run/systemd/system/etcd2.service.d
           └─20-cloudinit.conf
   Active: active (running) since Sun 2016-06-05 23:56:54 UTC; 1h 8min ago
 Main PID: 1135 (etcd2)
   Memory: 24.2M
      CPU: 18.091s
   CGroup: /system.slice/etcd2.service
           └─1135 /usr/bin/etcd2

Jun 06 01:05:18 coreos-512mb-nyc2-01 etcd2[1135]: 4692ba3abd59cdef is starting a new election at term 3020
Jun 06 01:05:18 coreos-512mb-nyc2-01 etcd2[1135]: 4692ba3abd59cdef became candidate at term 3021
Jun 06 01:05:18 coreos-512mb-nyc2-01 etcd2[1135]: 4692ba3abd59cdef received vote from 4692ba3abd59cdef at term 3021
Jun 06 01:05:18 coreos-512mb-nyc2-01 etcd2[1135]: 4692ba3abd59cdef [logterm: 1, index: 3] sent vote request to afc0c7d0eccba6c at term 3021
Jun 06 01:05:18 coreos-512mb-nyc2-01 etcd2[1135]: 4692ba3abd59cdef [logterm: 1, index: 3] sent vote request to d6907df338461404 at term 3021
Jun 06 01:05:19 coreos-512mb-nyc2-01 etcd2[1135]: 4692ba3abd59cdef is starting a new election at term 3021
Jun 06 01:05:19 coreos-512mb-nyc2-01 etcd2[1135]: 4692ba3abd59cdef became candidate at term 3022
Jun 06 01:05:19 coreos-512mb-nyc2-01 etcd2[1135]: 4692ba3abd59cdef received vote from 4692ba3abd59cdef at term 3022
Jun 06 01:05:19 coreos-512mb-nyc2-01 etcd2[1135]: 4692ba3abd59cdef [logterm: 1, index: 3] sent vote request to afc0c7d0eccba6c at term 3022
Jun 06 01:05:19 coreos-512mb-nyc2-01 etcd2[1135]: 4692ba3abd59cdef [logterm: 1, index: 3] sent vote request to d6907df338461404 at term 3022

If you couldn't tell, I'm a bit of a newbie when it comes to CoreOS and it's configuration. Can anyone give me some guidance as to why these fleet cannot connect to etcd2?

1 Answer

Within a few hours of posting this, I received a notification from the DigitalOcean Team noting that the droplets I created were created with an invalid networking configuration. As a result, the droplets were not able to properly communicate with eachother.

In case anyone else is wondering if this effects you, here is the notification that I received with resolution paths that you can take:

[DigitalOcean] Action Required on Your Recently Created Droplet

Hi there,

We’ve identified an issue with your recently created Droplet(s). Droplets created between 18:18 UTC and 23:57 UTC on June 5, 2016 were created with an invalid networking configuration.

As a result, networking will not work properly on your Droplet and regretfully we have had to disable networking on your Droplet.

In order to use your Droplet, you will need to destroy and re-create it, for which there are a few options:

1) Snapshot and Re-create: If you want to retain the data on your Droplet, please take a snapshot and redeploy from the snapshot.

2) Re-create from your preferred image: If you do not need any data on the Droplet, please destroy and re-create from your preferred image.

Locating the Droplet(s) to destroy: You can identify which Droplet(s) to destroy by looking at the Droplet ID, it will be within the following range: 16777116 to 16792599

We apologize for the service interruption you experienced today. This is our mistake and we are committed to making this right for you. Following the full resolution of this issue and internal post-mortem to identify the root cause.

We will be issuing SLA credits, at which time you will receive an email notification once they’re applied to your account.

Should you have any questions about the required steps, or to identify which Droplet(s) is/are affected please reply to this email and we’ll be happy to assist.

Thank you, Team DigitalOcean

The status page has also noted this issues: Status Page

After recreating the droplets, everything seems to be working now.

Have another answer? Share your knowledge.