cloud-init program that is available on recent distributions (only Ubuntu 14.04 and CentOS 7 at the time of this writing) is able to consume and execute data from the
user-data field of the DigitalOcean metadata service. This process behaves differently depending on the format of the information it finds. One of the most popular formats for scripts within
user-data is the cloud-config file format.
Cloud-config files are special scripts designed to be run by the cloud-init process. These are generally used for initial configuration on the very first boot of a server. In this guide, we will be discussing the format and usage of cloud-config files.
cloud-config format implements a declarative syntax for many common configuration items, making it easy to accomplish many tasks. It also allows you to specify arbitrary commands for anything that falls outside of the predefined declarative capabilities.
This “best of both worlds” approach lets the file acts like a configuration file for common tasks, while maintaining the flexibility of a script for more complex functionality.
The file is written using the YAML data serialization format. The YAML format was created to be easy to understand for humans and easy to parse for programs.
YAML files are generally fairly intuitive to understand when reading them, but it is good to know the actual rules that govern them.
Some important rules for YAML files are:
Let’s take these rules and analyze an example
cloud-config file, paying attention only to the formatting:
#cloud-config users: - name: demo groups: sudo shell: /bin/bash sudo: ['ALL=(ALL) NOPASSWD:ALL'] ssh-authorized-keys: - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDf0q4PyG0doiBQYV7OlOxbRjle026hJPBWD+eKHWuVXIpAiQlSElEBqQn0pOqNJZ3IBCvSLnrdZTUph4czNC4885AArS9NkyM7lK27Oo8RV888jWc8hsx4CD2uNfkuHL+NI5xPB/QT3Um2Zi7GRkIwIgNPN5uqUtXvjgA+i1CS0Ku4ld8vndXvr504jV9BMQoZrXEST3YlriOb8Wf7hYqphVMpF3b+8df96Pxsj0+iZqayS9wFcL8ITPApHi0yVwS8TjxEtI3FDpCbf7Y/DmTGOv49+AWBkFhS2ZwwGTX65L61PDlTSAzL+rPFmHaQBHnsli8U9N6E4XHDEOjbSMRX firstname.lastname@example.org - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDcthLR0qW6y1eWtlmgUE/DveL4XCaqK6PQlWzi445v6vgh7emU4R5DmAsz+plWooJL40dDLCwBt9kEcO/vYzKY9DdHnX8dveMTJNU/OJAaoB1fV6ePvTOdQ6F3SlF2uq77xYTOqBiWjqF+KMDeB+dQ+eGyhuI/z/aROFP6pdkRyEikO9YkVMPyomHKFob+ZKPI4t7TwUi7x1rZB1GsKgRoFkkYu7gvGak3jEWazsZEeRxCgHgAV7TDm05VAWCrnX/+RzsQ/1DecwSzsP06DGFWZYjxzthhGTvH/W5+KFyMvyA+tZV4i1XM+CIv/Ma/xahwqzQkIaKUwsldPPu00jRN user@desktop runcmd: - touch /test.txt
By looking at this file, we can learn a number of important things.
cloud-config file must begin with
#cloud-config alone on the very first line. This signals to the cloud-init program that this should be interpreted as a
cloud-config file. If this were a regular script file, the first line would indicate the interpreter that should be used to execute the file.
The file above has two top-level directives,
runcmd. These both serve as keys. The values of these keys consist of all of the indented lines after the keys.
In the case of the
users key, the value is a single list item. We know this because the next level of indentation is a dash (-) which specifies a list item, and because there is only one dash at this indentation level. In the case of the
users directive, this incidentally indicates that we are only defining a single user.
The list item itself contains an associative array with more key-value pairs. These are sibling elements because they all exist at the same level of indentation. Each of the user attributes are contained within the single list item we described above.
Some things to note are that the strings you see do not require quoting and that there are no unnecessary brackets to define associations. The interpreter can determine the data type fairly easily and the indentation indicates the relationship of items, both for humans and programs.
By now, you should have a working knowledge of the YAML format and feel comfortable working with information using the rules we discussed above.
We can now begin exploring some of the most common directives for
To define new users on the system, you can use the
users directive that we saw in the example file above.
The general format of user definitions is:
#cloud-config users: - first_user_parameter first_user_parameter - second_user_parameter second_user_parameter second_user_parameter second_user_parameter
Each new user should begin with a dash. Each user defines parameters in key-value pairs. The following keys are available for definition:
shshell will be used.
authorized_keysfile in their
/home/<username>, which is otherwise created and set.
/home/<username>directory for the user.
Other than some basic information, like the
name key, you only need to define the areas where you are deviating from the default or supplying needed data.
One thing that is important for users to realize is that the
passwd field should not be used in production systems unless you have a mechanism of immediately modifying the given value. As with all information submitted as user-data, the hash will remain accessible to any user on the system for the entire life of the server. On modern hardware, these hashes can easily be cracked in a trivial amount of time. Exposing even the hash is a huge security risk that should not be taken on any machines that are not disposable.
For an example user definition, we can use part of the example
cloud-config we saw above:
#cloud-config users: - name: demo groups: sudo shell: /bin/bash sudo: ['ALL=(ALL) NOPASSWD:ALL'] ssh-authorized-keys: - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDf0q4PyG0doiBQYV7OlOxbRjle026hJPBWD+eKHWuVXIpAiQlSElEBqQn0pOqNJZ3IBCvSLnrdZTUph4czNC4885AArS9NkyM7lK27Oo8RV888jWc8hsx4CD2uNfkuHL+NI5xPB/QT3Um2Zi7GRkIwIgNPN5uqUtXvjgA+i1CS0Ku4ld8vndXvr504jV9BMQoZrXEST3YlriOb8Wf7hYqphVMpF3b+8df96Pxsj0+iZqayS9wFcL8ITPApHi0yVwS8TjxEtI3FDpCbf7Y/DmTGOv49+AWBkFhS2ZwwGTX65L61PDlTSAzL+rPFmHaQBHnsli8U9N6E4XHDEOjbSMRX email@example.com - ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDcthLR0qW6y1eWtlmgUE/DveL4XCaqK6PQlWzi445v6vgh7emU4R5DmAsz+plWooJL40dDLCwBt9kEcO/vYzKY9DdHnX8dveMTJNU/OJAaoB1fV6ePvTOdQ6F3SlF2uq77xYTOqBiWjqF+KMDeB+dQ+eGyhuI/z/aROFP6pdkRyEikO9YkVMPyomHKFob+ZKPI4t7TwUi7x1rZB1GsKgRoFkkYu7gvGak3jEWazsZEeRxCgHgAV7TDm05VAWCrnX/+RzsQ/1DecwSzsP06DGFWZYjxzthhGTvH/W5+KFyMvyA+tZV4i1XM+CIv/Ma/xahwqzQkIaKUwsldPPu00jRN user@desktop
To define groups, you should use the
groups directive. This directive is relatively simple in that it just takes a list of groups you would like to create.
An optional extension to this is to create a sub-list for any of the groups you are making. This new list will define the users that should be placed in this group:
#cloud-config groups: - group1 - group2: [user1, user2]
For user accounts that already exist (the
root account is the most pertinent), a password can be suppled by using the
Note: This directive should only be used in debugging situations, because, once again, the value will be available to every user on the system for the duration of the server’s life. This is even more relevant in this section because passwords submitted with this directive must be given in plain text.
The basic syntax looks like this:
#cloud-config chpasswd: list: | user1:password1 user2:password2 user3:password3 expire: False
The directive contains two associative array keys. The
list key will contain a block that lists the account names and the associated passwords that you would like to assign. The
expire key is a boolean that determines whether the password must be changed at first boot or not. This defaults to “True”.
One thing to note is that you can set a password to “RANDOM” or “R”, which will generate a random password and write it to
/var/log/cloud-init-output.log. Keep in mind that this file is accessible to any user on the system, so it is not any more secure.
In order to write files to the disk, you should use the
Each file that should be written is represented by a list item under the directive. These list items will be associative arrays that define the properties of each file.
The only required keys in this array are
path, which defines where to write the file, and
content, which contains the data you would like the file to contain.
The available keys for configuring a
write_files item are:
For example, we could write a file to
/test.txt with the contents:
Here is a line. Another line is here.
The portion of the
cloud-config that would accomplish this would look like this:
#cloud-config write_files: - path: /test.txt content: | Here is a line. Another line is here.
To manage packages, there are a few related settings and directives to keep in mind.
To update the apt database on Debian-based distributions, you should set the
package_update directive to “true”. This is synonymous with calling
apt-get update from the command line.
The default value is actually “true”, so you only need to worry about this directive if you wish to disable it:
#cloud-config package_update: false
If you wish to upgrade all of the packages on your server after it boots up for the first time, you can set the
package_upgrade directive. This is akin to a
apt-get upgrade executed manually.
This is set to “false” by default, so make sure you set this to “true” if you want the functionality:
#cloud-config package_upgrade: true
To install additional packages, you can simply list the package names using the “packages” directive. Each list item should represent a package. Unlike the two commands above, this directive will function with either yum or apt managed distros.
These items can take one of two forms. The first is simply a string with the name of the package. The second form is a list with two items. The first item of this new list is the package name, and the second item is the version number:
#cloud-config packages: - package_1 - package_2 - [package_3, version_num]
The “packages” directive will set
apt_update to true, overriding any previous setting.
You can manage SSH keys in the
users directive, but you can also specify them in a dedicated
ssh_authorized_keys section. These will be added to the first defined user’s authorized_keys file.
This takes the same general format of the key specification within the
#cloud-config ssh_authorized_keys: - ssh_key_1 - ssh_key_2
You can also generate the SSH server’s private keys ahead of time and place them on the filesystem. This can be useful if you want to give your clients the information about this server beforehand, allowing it to trust the server as soon as it comes online.
To do this, we can use the
ssh_keys directive. This can take the key pairs for RSA, DSA, or ECDSA keys using the
Since formatting and line breaks are important for private keys, make sure to use a block with a pipe key when specifying these. Also, you must include the begin key and end key lines for your keys to be valid.
#cloud-config ssh_keys: rsa_private: | -----BEGIN RSA PRIVATE KEY----- your_rsa_private_key -----END RSA PRIVATE KEY----- rsa_public: your_rsa_public_key
If your infrastructure relies on keys signed by an internal certificate authority, you can set up your new machines to trust your CA cert by injecting the certificate information. For this, we use the
This directive has two sub-items. The first is
remove-defaults, which, when set to true, will remove all of the normal certificate trust information included by default. This is usually not needed and can lead to some issues if you don’t know what you are doing, so use with caution.
The second item is
trusted, which is a list, each containing a trusted CA certificate:
#cloud-config ca-certs: remove-defaults: true trusted: - | -----BEGIN CERTIFICATE----- your_CA_cert -----END CERTIFICATE-----
If you have configured your own DNS servers that you wish to use, you can manage your server’s resolv.conf file by using the
resolv_conf directive. This currently only works for RHEL-based distributions.
resolv_conf directive, you can manage your settings with the
nameservers directive should take a list of the IP addresses of your name servers. The
searchdomains directive takes a list of domains and subdomains to search in when a user specifies a host but not a domain.
domain sets the domain that should be used for any unresolvable requests, and
options contains a set of options that can be defined in the resolv.conf file.
If you are using the
resolv_conf directive, you must ensure that the
manage-resolv-conf directive is also set to true. Not doing so will cause your settings to be ignored:
#cloud-config manage-resolv-conf: true resolv_conf: nameservers: - 'first_nameserver' - 'second_nameserver' searchdomains: - first.domain.com - second.domain.com domain: domain.com options: option1: value1 option2: value2 option3: value3
If none of the managed actions that
cloud-config provides works for what you want to do, you can also run arbitrary commands. You can do this with the
This directive takes a list of items to execute. These items can be specified in two different ways, which will affect how they are handled.
If the list item is a simple string, the entire item will be passed to the
sh shell process to run.
The other option is to pass a list, each item of which will be executed in a similar way to how
execve processes commands. The first item will be interpreted as the command or script to run, and the following items will be passed as arguments for that command.
Most users can use either of these formats, but the flexibility enables you to choose the best option if you have special requirements. Any output will be written to standard out and to the
#cloud-config runcmd: - [ sed, -i, -e, 's/here/there/g', some_file] - echo "modified some_file" - [cat, some_file]
In some cases, you’ll want to shutdown or reboot your server after executing the other items. You can do this by setting up the
This directive has four sub-items that can be set. These are
delay specifies how long into the future the restart or shutdown should occur. By default, this will be “now”, meaning the procedure will begin immediately. To add a delay, users should specify, in minutes, the amount of time that should pass using the
timeout parameter takes a unit-less value that represents the number of seconds to wait for cloud-init to complete before initiating the
message field allows you to specify a message that will be sent to all users of the system. The
mode specifies the type of power event to initiate. This can be “poweroff” to shut down the server, “reboot” to restart the server, or “halt” to let the system decide which is the best action (usually shutdown):
#cloud-config power_state: timeout: 120 delay: "+5" message: Rebooting in five minutes. Please save your work. mode: reboot
The above examples represent some of the more common configuration items available when running a
cloud-config file. There are additional capabilities that we did not cover in this guide. These include configuration management setup, configuring additional repositories, and even registering with an outside URL when the server is initialized.
You can find out more about some of these options by checking the
/usr/share/doc/cloud-init/examples directory. For a practical guide to help you get familiar with
cloud-config files, you can follow our tutorial on how to use cloud-config to complete basic server configuration here.
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.