// Tutorial //

How To Set Up a CoreOS Cluster on DigitalOcean

Published on September 5, 2014
Default avatar
By Mitchell Anicas
Developer and author at DigitalOcean.
How To Set Up a CoreOS Cluster on DigitalOcean

This tutorial is out of date and no longer maintained.

Status: Out of Date

This article is no longer current. If you are interested in writing an update for this article, please see DigitalOcean wants to publish your tech tutorial!

Reason: On December 22, 2016, CoreOS announced that it no longer maintains fleet. CoreOS recommends using Kubernetes for all clustering needs.

See Instead: For guidance using Kubernetes on CoreOS without fleet, see the Kubernetes on CoreOS Documentation.

Introduction

If you are planning on using CoreOS in your infrastructure, the first thing you will want to set up is a CoreOS cluster. In order for CoreOS machines to form a cluster, their etcd2 instances must be connected. In this tutorial, we will give step-by-step instructions to quickly create a 3-node CoreOS cluster on DigitalOcean.

Prerequisites

If you are unfamiliar with the components that CoreOS is built on (docker, etcd2, and fleet) it is highly recommended that you read An Introduction to CoreOS System Components. You will want to pay particular attention to the section that covers etcd2, since that component is essential to the cluster discovery process.

SSH Keys

Every CoreOS server that you create will need to have at least one SSH public key installed during its creation process. The key(s) will be installed to the core user’s authorized keys file, and you will need the corresponding private key(s) to log in to your CoreOS server.

If you do not already have any SSH keys associated with your DigitalOcean account, do so now by following steps 1-3 of this tutorial: How To Use SSH Keys with DigitalOcean Droplets. Then you will want to add your private key to your SSH agent on your client machine by running the following command:

ssh-add

For more about this step, see this article.

DigitalOcean Personal Access Token

If you are planning on using the DigitalOcean API to create your CoreOS machines, refer to this tutorial for information on how to generate and use a Personal Access Token with write permissions.

Now that you have the prerequisites out of the way, let’s start building our CoreOS cluster!

Generate a New Discovery URL

The first step to setting up a new CoreOS cluster is generating a new discovery URL, a unique address that stores peer CoreOS addresses and metadata. The easiest way to do this is to use https://discovery.etcd.io, a free discovery service. A new discovery URL can be generated by visiting https://discovery.etcd.io/new in a web browser or by running the following curl command:

curl -w "\n" "https://discovery.etcd.io/new?size=3"

Either method will return a fresh, unique discovery URL that looks something like the following (the highlighted part will be a unique token):

https://discovery.etcd.io/5c1574906b3502aa9d8dc43c1b185775

You will use your resulting discovery URL to create your new CoreOS cluster. The same discovery URL must be specified in the etcd2 section of the cloud-config of each server that you want to add to a particular CoreOS cluster.

Now that we have a discovery URL, let’s look at how to create cloud-config file that uses it.

Write a Cloud-Config File

CoreOS uses a file called cloud-config which allows you to declaratively customize network configuration, systemd units, and other OS-level items. This file is written in YAML format, which uses indentation to denote data hierarchy. The cloud-config file is processed when a machine is booted, and provides a way to configure your machines with etcd2 settings that will allow them to discover the cluster that they should join.

We will cover how to write a minimal cloud-config to get a working CoreOS cluster up and running. For a full list of items that can be configured with cloud-config, check out the official documentation. They also provide a helpful tool that can check your cloud-config file’s syntax, Cloud-Config Validator.

Minimal Cloud-Config

As mentioned earlier, the peer addresses of each CoreOS machine in a cluster is stored with the discovery URL. Therefore, each machine in a cluster must use the same discovery URL and pass in its own IP address where its etcd2 service can be reached. These are specified in cloud-config under the etcd2 section, and are shown in the code block below.

You will also need to specify a units section, which will start the etcd2 and fleet services that are necessary for a working CoreOS cluster.

Here is a basic cloud-config file that can be used with your CoreOS machines to make a new cluster (substitute the value of discovery with the discovery URL that you generated earlier):

#cloud-config

coreos:
  etcd2:
    # generate a new token for each unique cluster from https://discovery.etcd.io/new:
    discovery: https://discovery.etcd.io/<$><discovery_token><$>
    # multi-region deployments, multi-cloud deployments, and Droplets without
    # private networking need to use $public_ipv4:
    advertise-client-urls: http://$private_ipv4:2379,http://$private_ipv4:4001
    initial-advertise-peer-urls: http://$private_ipv4:2380
    # listen on the official ports 2379, 2380 and one legacy port 4001:
    listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
    listen-peer-urls: http://$private_ipv4:2380
  fleet:
    public-ip: $private_ipv4   # used for fleetctl ssh command
  units:
    - name: etcd2.service
      command: start
    - name: fleet.service
      command: start

Note: The #cloud-config line is required. The $private_ipv4 and $public_ipv4 substitution variables are fully supported in cloud-config on DigitalOcean–these variables will be replaced with the actual respective IP addresses of your new VPS. Also, the fleet section is not required if you do not intend to use the fleetctl ssh command.

This cloud-config script can be used to set up a basic CoreOS that can be used for testing purposes; unfortunately, it is not very secure. For a more serious setup, you should set up a secure CoreOS cluster by following this tutorial: How To Secure Your CoreOS Cluster with TLS/SSL and Firewall Rules.

Create CoreOS Cluster

Now that you know what your cloud-config file for each machine in your new CoreOS cluster will consist of, let’s create your CoreOS cluster. Because Droplets can be created through the DigitalOcean Control Panel or API, we will show you how to create your CoreOS cluster using both methods.

DigitalOcean Control Panel

First, visit the DigitalOcean Control Panel then click the Create Droplet button.

Next, select CoreOS as your Linux distribution, then select which channel you want to use (Stable, Beta, or Alpha).

Then select your desired Droplet size. A smaller size is fine if you’re doing basic testing.

Next, select your preferred datacenter region.

Under the Select additional options header, select Private Networking and User Data. Copy and paste your cloud-config script into the User Data text field. It should look something like this:

User-data

Next, select at least one SSH key that you want to use to log in to your Droplets.

Under the Finalize and create section, create at least three Droplets and specify their hostnames. In our example, we’ll call them coreos-01, coreos-02, and coreos-03:

Create 3 Droplets

Lastly, click the Create button to create the Droplets that will form your CoreOS cluster.

To learn more about the Droplet creation process, using the DigitalOcean Control Panel, refer to this guide.

DigitalOcean API

If you use the DigitalOcean API to create your CoreOS Droplets, you can specify your cloud-config via the user_data parameter in your Droplet creation POST request–just paste the whole script in there.

Let us assume that we want to create three 1 GB Droplets named coreos-01, coreos-02, and coreos-03 with private networking, in the NYC3 data center, using the CoreOS Stable channel image, and the cloud-config file shown earlier. Here is an example of the curl command you would run to create it using the DigitalOcean API:

curl -X POST "https://api.digitalocean.com/v2/droplets" \
      -d'{"names":["coreos-01","coreos-02","coreos-03"],"region":"nyc3","size":"1GB","private_networking":true,"image":"coreos-stable","user_data":
"#cloud-config

coreos:
  etcd2:
    # generate a new token for each unique cluster from https://discovery.etcd.io/new:
    discovery: https://discovery.etcd.io/<$><discovery_token><$>
    # multi-region deployments, multi-cloud deployments, and Droplets without
    # private networking need to use $public_ipv4:
    advertise-client-urls: http://$private_ipv4:2379,http://$private_ipv4:4001
    initial-advertise-peer-urls: http://$private_ipv4:2380
    # listen on the official ports 2379, 2380 and one legacy port 4001:
    listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
    listen-peer-urls: http://$private_ipv4:2380
  fleet:
    public-ip: $private_ipv4   # used for fleetctl ssh command
  units:
    - name: etcd2.service
      command: start
    - name: fleet.service
      command: start",
      "ssh_keys":[ <SSH Key ID(s)> ]}' \
      -H "Authorization: Bearer $TOKEN" \
      -H "Content-Type: application/json"

Note: This is just like a normal multi-Droplet create request, with the addition of the example cloud-config passed through the user_data parameter.

You must substitute your SSH Key ID(s) or fingerprint(s) for <SSH Key ID(s)>, and make sure $TOKEN is set to one of your read/write DigitalOcean Personal Access Tokens.

After running this command with the appropriate substitutions, your 3-node CoreOS cluster will be created.

For more information about using the API, please refer to this tutorial.

Verify Cluster

To verify that our 3-machine cluster has formed properly, we must SSH to one of the cluster members.

Log into the coreos-01 machine as the core user via SSH, and use the -A option to forward your SSH agent. Remember to substitute the public IP address:

ssh -A core@coreos-01_public_IP

At the command prompt, enter this fleetctl command to show all the members of the cluster:

fleetctl list-machines

You should see a list of all of the online machines in the cluster, identifiable by their respective peer-addr IP addresses. Here is an example of the output:

MACHINE		IP		METADATA
59b2fffd...	10.131.29.141	-
853b0df3...	10.131.63.121	-
cd64a2e3...	10.131.63.120	-

If you see all of the machines that you created, all of them are aware of each other via etcd2, and your cluster has formed properly!

Warning: Be sure to set up IPTables to restrict access to port 4001 to machines within your CoreOS cluster, after the cluster is set up. This will prevent external, unauthorized users from controlling your CoreOS machines. For production use, you should strongly consider following the steps in this guide to securing a CoreOS cluster with TLS/SSL certificates and firewall rules.

Adding New Machines

If you would like to add new machines to an existing CoreOS cluster, simply create a new Droplet using the same cloud-config (and discovery URL). Your new CoreOS machine will automatically join the existing cluster.

If you forgot which discovery URL you used, you may look it up on one of the members of the cluster. Use the following grep command on one of your existing machines:

grep DISCOVERY /run/systemd/system/etcd2.service.d/20-cloudinit.conf

You will see a line the contains the original discovery URL, like the following:

Environment="ETCD_DISCOVERY=https://discovery.etcd.io/575302f03f4fb2db82e81ea2abca55e9"

Conclusion

Your basic CoreOS cluster is set up, and now you can move on to testing with it! If you are looking to set up a secure CoreOS cluster, follow this tutorial: How To Secure Your CoreOS Cluster with TLS/SSL and Firewall Rules.

The rest of the tutorials in this series will show you more about CoreOS, and how to use docker containers and service discovery with your CoreOS cluster.

If you’ve enjoyed this tutorial and our broader community, consider checking out our DigitalOcean products which can also help you achieve your development goals.

Learn more here


Tutorial Series: Getting Started with CoreOS

CoreOS is a powerful Linux distribution built to make large, scalable deployments on varied infrastructure simple to manage. Based on a build of Chrome OS, CoreOS maintains a lightweight host system and uses Docker containers for all applications. In this series, we will introduce you to the basics of CoreOS, teach you how to set up a CoreOS cluster, and get you started with using docker containers with CoreOS.

About the authors
Default avatar
Developer and author at DigitalOcean.

Still looking for an answer?

Was this helpful?
10 Comments

This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Since DigitalOcean droplets with private networking enabled are on the same private network as other customers’ droplets, then if “$private_ipv4” is specified for “addr” and “peer-addr”, isn’t it critical that etcd be secured with TLS and client cert authentication?

See: CoreOS – Etcd: Reading and Writing over HTTPS

I realize that delving into that aspect of coreos/etcd configuration is beyond the scope of this introductory “how to” article, but I believe that some strong mention should be given to this security-related concern.

Worth noting that if users move on to the next part of the series and haven’t ssh’d to their coreOS box with a -A, their ssh agent will not be forwarded, and fleet won’t work as expected. Changing the ssh command in this post to a -A would fix the problems users may see.

Do $public_ipv6 and $private_ipv6 exist as well?

Where can I find a list of all variables available to cloud-install on Digital Ocean?

I had this same problem as icoz. I was able to solve it by setting up a new cluster of machines. I had setup several clusters using the same discovery URL in the cloud config user date and I tried generating a fresh URL using the link:

https://discovery.etcd.io/new

Also, I had turned on IPV6 support in my first machines. It is possible its related to this as well. In any case after those two changes I was successful and fleetclt list-machines showed a working cluster.

I am trying to run coreos using your how-to. But on step fleetctl list-machines I got: E0907 13:56:51.771686 00847 client.go:200] Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 100ms E0907 13:56:51.872851 00847 client.go:200] Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 200ms E0907 13:56:52.073681 00847 client.go:200] Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 400ms E0907 13:56:52.474515 00847 client.go:200] Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 800ms E0907 13:56:53.275426 00847 client.go:200] Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 1s E0907 13:56:54.276284 00847 client.go:200] Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 1s E0907 13:56:54.771553 00847 fleetctl.go:152] error attempting to check latest fleet version in Registry: timeout reached E0907 13:56:54.772189 00847 client.go:200] Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 100ms E0907 13:56:54.873013 00847 client.go:200] Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 200ms E0907 13:56:55.073946 00847 client.go:200] Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 400ms E0907 13:56:55.474776 00847 client.go:200] Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 800ms E0907 13:56:56.275632 00847 client.go:200] Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 1s E0907 13:56:57.276739 00847 client.go:200] Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 1s Error retrieving list of active machines: timeout reached

Trying to see something in etcd: etcdctl ls Error: Cannot sync with the cluster using peers 127.0.0.1:4001

journalctl | tail says: Sep 07 13:59:33 co1 etcd[914]: [etcd] Sep 7 13:59:33.531 INFO | 0d704bc2bca944f3ae08dca165a8393b attempted to join via 10.132.243.51:7001 failed: fail checking join version: Client Internal Error (Get http://10.132.243.51:7001/version: dial tcp 10.132.243.51:7001: i/o timeout) Sep 07 13:59:33 co1 etcd[914]: [etcd] Sep 7 13:59:33.532 INFO | 0d704bc2bca944f3ae08dca165a8393b attempted to join via 10.131.238.210:7001 failed: fail checking join version: Client Internal Error (Get http://10.131.238.210:7001/version: dial tcp 10.131.238.210:7001: connection refused) Sep 07 13:59:33 co1 etcd[914]: [etcd] Sep 7 13:59:33.532 INFO | 0d704bc2bca944f3ae08dca165a8393b is unable to join the cluster using any of the peers [10.131.238.213:7001 10.131.238.213:7001 10.131.238.210:7001 10.131.238.53:7001 10.132.243.51:7001 10.131.238.210:7001] at 0th time. Retrying in 3.8 seconds Sep 07 13:59:36 co1 etcd[914]: [etcd] Sep 7 13:59:36.533 INFO | 0d704bc2bca944f3ae08dca165a8393b attempted to join via 10.131.238.213:7001 failed: fail checking join version: Client Internal Error (Get http://10.131.238.213:7001/version: dial tcp 10.131.238.213:7001: connection refused) Sep 07 13:59:36 co1 etcd[914]: [etcd] Sep 7 13:59:36.534 INFO | 0d704bc2bca944f3ae08dca165a8393b attempted to join via 10.131.238.213:7001 failed: fail checking join version: Client Internal Error (Get http://10.131.238.213:7001/version: dial tcp 10.131.238.213:7001: connection refused) Sep 07 13:59:36 co1 etcd[914]: [etcd] Sep 7 13:59:36.536 INFO | 0d704bc2bca944f3ae08dca165a8393b attempted to join via 10.131.238.210:7001 failed: fail checking join version: Client Internal Error (Get http://10.131.238.210:7001/version: dial tcp 10.131.238.210:7001: connection refused) Sep 07 13:59:37 co1 etcd[914]: [etcd] Sep 7 13:59:37.887 INFO | 0d704bc2bca944f3ae08dca165a8393b attempted to join via 10.131.238.53:7001 failed: fail checking join version: Client Internal Error (Get http://10.131.238.53:7001/version: dial tcp 10.131.238.53:7001: i/o timeout) Sep 07 13:59:39 co1 etcd[914]: [etcd] Sep 7 13:59:39.238 INFO | 0d704bc2bca944f3ae08dca165a8393b attempted to join via 10.132.243.51:7001 failed: fail checking join version: Client Internal Error (Get http://10.132.243.51:7001/version: dial tcp 10.132.243.51:7001: i/o timeout) Sep 07 13:59:39 co1 etcd[914]: [etcd] Sep 7 13:59:39.242 INFO | 0d704bc2bca944f3ae08dca165a8393b attempted to join via 10.131.238.210:7001 failed: fail checking join version: Client Internal Error (Get http://10.131.238.210:7001/version: dial tcp 10.131.238.210:7001: connection refused) Sep 07 13:59:39 co1 etcd[914]: [etcd] Sep 7 13:59:39.242 INFO | 0d704bc2bca944f3ae08dca165a8393b is unable to join the cluster using any of the peers [10.131.238.213:7001 10.131.238.213:7001 10.131.238.210:7001 10.131.238.53:7001 10.132.243.51:7001 10.131.238.210:7001] at 1th time. Retrying in 3.8 seconds

What can I do to start CoreOS correctly?

listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001

@manicas I don’t think that is very safe, everyone which know the public ip, can GET/PUT/DELETE key from etcd.

This was a very usefull article, thank you.

The only problem i encountered is that fleetctl when executing “fleetctl list-machines” is showing the public ips instead of private ones. I used the following cloud config:

#cloud-config

coreos:
  etcd2:
    discovery: "Discovery key here"
    advertise-client-urls: http://$private_ipv4:2379,http://$private_ipv4:4001
    initial-advertise-peer-urls: http://$private_ipv4:2380
    listen-client-urls: http://0.0.0.0:2379,http://0.0.0.0:4001
    listen-peer-urls: http://$private_ipv4:2380
  units:
    - name: etcd2.service
      command: start
    - name: fleet.service
      command: start

Is there maybe a fix for this?

Since there is a stern warning about setting up a firewall, there should maybe also be a link to or an example of how it is done in CoreOS. I found this blog post helpful, and all it comes down to is adding some bits into your cloud-config. These additions configure a persistent iptables firewall that lets SSH and HTTP[S] traffic through, plus already established connections and some ICMP messages:

coreos:
  units:
    - name: iptables-restore.service
      enable: true
write_files:
  - path: /var/lib/iptables/rules-save
    permissions: 0644
    owner: root:root
    content: |
      *filter
      :INPUT DROP [0:0]
      :FORWARD DROP [0:0]
      :OUTPUT ACCEPT [0:0]
      -A INPUT -i lo -j ACCEPT
      -A INPUT -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
      -A INPUT -p tcp -m tcp --dport 22 -j ACCEPT
      -A INPUT -p tcp -m tcp --dport 80 -j ACCEPT
      -A INPUT -p tcp -m tcp --dport 443 -j ACCEPT
      -A INPUT -p icmp -m icmp --icmp-type 0 -j ACCEPT
      -A INPUT -p icmp -m icmp --icmp-type 3 -j ACCEPT
      -A INPUT -p icmp -m icmp --icmp-type 11 -j ACCEPT
      COMMIT

I had same problem that described here in comments. It was caused by limited width in user-data field in web form. Long comment line got stripped into two lines and cloudinit failed to parse my user-data.

Just remove lines with comments in it will work fine.