CoreOS: Only 2 of my 3 machines are showing in "list-machines"

November 21, 2014 695 views

Hi There,

I have attempted to setup a CoreOS Cluster using 3 DigitalOcean droplets across various Datacentres. When I input "fleetctl list-machines" into the leader container (LDN1) it gives the following response:

core@ldn1 ~ $ fleetctl list-machines


035d1637... host=LDN1,location=lon1,public_ip=178.62.10.*,region=lon

0c948a60... host=AMS1,location=ams3,public_ip=128.199.5.,region=ams
My issue here is that it is not displaying my New York container, using the same Discovery URL as the others, yet when I input "curl -L it shows the NY container:

core@ldn1 ~ $ curl -L

{"leader":"LDN1","followers":{"AMS1":{"latency":{"current":8.961378,"average":9. 948075820900373,"standardDeviation":1.6590218277732314,"minimum":8.478871,"maxim um":52.14463},"counts":{"fail":0,"success":6220}},"NYC1":{"latency":{"current":0 ,"average":0,"standardDeviation":0,"minimum":9.223372036854776e+18,"maximum":0},

If anybody can shed some light on this it would be greatly appreciated! I've been here since 9 trying to configure this correctly and it's starting to get a bit tedious.

Kind Regards,

1 Answer

Unfortunately, running a cross-datacenter cluster isn't a very well supported configuration. You might have some luck by adjusting the peer-election-timeout and peer-heartbeat-interval values. From the CoreOS etcd docs:

The default settings in etcd should work well for installations on a local network where the average network latency is low. However, when using etcd across multiple data centers or over networks with high latency you may need to tweak the heartbeat interval and election timeout settings.

This guide might help you dig into some more debugging:

CoreOS is an extremely powerful operating system focused on cluster management, security, and containerized service deployments. However, the unconventional way that the system is set up can make troubleshooting somewhat difficult. In this guide, we'll cover the basics of how to track down issues in your deployment as well as your services.
Have another answer? Share your knowledge.