Question

CoreOS shows failed units at login, what does it mean?

When I log into my CoreOS recently, I noticed a built-up list of “failed units”: $ ssh core@188.166.209.155 Last login: Fri Jan 8 14:36:34 2016 from 101.176.57.46 CoreOS stable (835.9.0) Failed Units: 5 sshd@16493-188.166.209.155:22-222.186.15.79:2573.service sshd@1756-188.166.209.155:22-151.25.198.11:59295.service sshd@1766-188.166.209.155:22-151.25.198.11:55062.service sshd@23027-188.166.209.155:22-111.74.239.61:2490.service sshd@23028-188.166.209.155:22-111.74.239.61:2039.service

There are no explanations of what they are. Can someone explain? They seem to be a list of broken SSH connections…?

Subscribe
Share

Submit an answer
You can type!ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

First, investigate the failed units. These frequently can be reports of attempts of faileed SSH connections, or dropped sessions from legitimate users.

Then clear them:

sudo systemctl reset-failed

First, investigate the failed units. These frequently can be reports of attempts of faileed SSH connections, or dropped sessions from legitimate users.

Then clear them:

sudo systemctl reset-failed

First, investigate the failed units. These frequently can be reports of attempts of faileed SSH connections, or dropped sessions from legitimate users.

Then clear them:

sudo systemctl reset-failed

This happened to me as well. Join me on my journey to find an answer, below

Phrasing the Question

This happens when some systemd unit fails on the base system. (This has nothing to do with docker, or docker images.)

Presumably, the sshd server is failing, which may or may not be related to a dropped connection.

You can use these commands to see more information:

$ systemctl --failed
$ systemd status sshd@16493-188.166.209.155:22-222.186.15.79:2573.service

For reference, I’m getting the following error:

Failed to run 'start' task: Transport endpoint is not connected

Finding the Answer

Here is my output from list-units, which helped me figure this out, finally:

$ systemctl list-units | grep sshd
  sshd-keygen.service                                                           loaded active exited    Generate sshd host keys
  sshd@1296-104.236.58.186:22-137.22.169.192:43537.service                       loaded active running   OpenSSH per-connection server daemon (98.22.169.192:43537)
● sshd@532-104.236.58.186:22-277.186.58.136:1283.service                        loaded failed failed    OpenSSH per-connection server daemon (222.186.58.136:1283)
  system-sshd.slice                                                             loaded active active    system-sshd.slice
  sshd.socket                                                                   loaded active listening OpenSSH Server Socket

It looks like sshd.socket is where the sshd daemon listens for new connections. The sshd@ lines report one failure and one success (for me). These appear to take the form of

sshd@1234-<server-IP>:<server-port>-<remote-IP>:<remote-port>.service

This explains what happened: I tried logging in from work the other day, and did not have the correct ssh key. So, for me at least, this represents a failed login. While that represents a security crisis averted — and I’m grateful that CoreOS brought this to my attention — I haven’t figured how to clear it out yet.

References:

Postscript

Good luck with your new web service!

{"reason":"Nothing here."}

It means somebody (not necessarily yourself, perhaps a bad guy) tried to login via ssh, and failed.

To protect the host from compromise / resource exhaustion, you may want to change port & disable password authentication in sshd_config. Many script kids stops trying when they find that password auth is not available.