Connect Droplet via SSH is Unstable. Encounter "timed out" occasionally.

July 28, 2019 297 views
DigitalOcean Docker Debian

Hi,

I created a docker droplet recently and try to create a generic docker-machine in my windows workspace.

However, I cannot successfully create the docker-machine. After debugging, I found out that I cannot successfully connect to my droplet with SSH every time.

Below is my command to create docker-machine:

 docker-machine -D --native-ssh create --driver generic --generic-ip-address=<IP> --generic-ssh-key <RSA KEY> remote

The command stuck when running SSH command, below is partial log:

About to run SSH command:
sudo hostname docker && echo "docker" | sudo tee /etc/hostname
SSH cmd err, output: <nil>:
(docker) Calling .GetSSHHostname
(docker) Calling .GetSSHPort
(docker) Calling .GetSSHKeyPath
(docker) Calling .GetSSHKeyPath
(docker) Calling .GetSSHUsername
Using SSH client type: native
&{{{<nil> 0 [] [] []} root [0x76ed50] 0x76ed00  [] 0s} 167.71.214.248 22 <nil> <nil>}
About to run SSH command:

                if ! grep -xq '.*\sdocker' /etc/hosts; then
                        if grep -xq '127.0.1.1\s.*' /etc/hosts; then
                                sudo sed -i 's/^127.0.1.1\s.*/127.0.1.1 docker/g' /etc/hosts;
                        else
                                echo '127.0.1.1 docker' | sudo tee -a /etc/hosts;
                        fi
                fi
Error dialing TCP: dial tcp 167.71.214.248:22: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
Error dialing TCP: dial tcp 167.71.214.248:22: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
Error dialing TCP: dial tcp 167.71.214.248:22: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
Error dialing TCP: dial tcp 167.71.214.248:22: connectex: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

I try to connect to my droplet with below command:

ssh root@167.71.214.248 echo hi

The command cannot always run successfully. Below is my screenshot:

img

I'm wondering what's going wrong? Please kindly help me. Thanks a lot.


Reference

  1. Droplet Info
    info

  2. Docker machine version:

    docker-machine.exe version 0.14.0, build 89b8332
    

    I have tried version 0.12 ~ 0.16, but still encounter the same error

  3. My computer

    1. Window 10 Home Edition
    2. Have tried both OpenSSH and SSH in Docker Quick Start terminal
    3. Using DockerTools
2 Answers
alexgeorgiev July 28, 2019
Accepted Answer

Hello

This could mean there is a (temporary) problem with the server (or some router along the way). You can try a traceroute to determine whether or not this is true.

You can perform the traceroute on MAC/Linux using:

traceroute 167.71.214.248

On Windows using:

tracert 167.71.214.248

Let me know if you have any questions.
Alex

  • Hi Alex,

    Thank you for the reply.
    I try to use tracert, below is the result:

    trace

    It always receives "Request timed out" at node 15th and 16th. Not sure if it is the root-cause?

    Is there any other resource I could refer to fix this issue?

    Thank you for the reply again~

    • Thanks for replying back.

      Looking at this (I've also conducted some traceroute tests on my end) there might be a router which is not configured properly, because I'm able to replicate your issue.

      I got a 100% packet loss:

        1.|-- 10.97.21.252               0.0%    10   21.4  18.3  12.7  23.8   3.7
        2.|-- 10.97.23.224               0.0%    10    1.7   1.1   0.4   1.8   0.6
        3.|-- 94.155.94.22               0.0%    10    2.8   2.6   0.9  10.6   2.9
        4.|-- sfia-b2-link.telia.net     0.0%    10    1.7   3.5   1.3   7.1   1.9
        5.|-- prag-bb1-link.telia.net    0.0%    10   27.8  29.2  27.2  35.7   2.5
        6.|-- win-b4-link.telia.net      0.0%    10   33.5  34.2  33.2  36.1   0.9
        7.|-- 195.219.25.36              0.0%    10   34.4  34.7  34.1  35.9   0.7
        8.|-- if-ae-3-2.tcore2.fnm-fran  0.0%    10  272.2 272.7 272.0 273.8   0.6
        9.|-- if-ae-7-2.tcore2.wyn-mars  0.0%    10  276.5 276.7 276.2 277.7   0.4
       10.|-- if-ae-2-2.tcore1.wyn-mars  0.0%    10  281.8 282.2 281.7 283.0   0.5
       11.|-- if-ae-31-6.tcore1.svw-sin  0.0%    10  276.3 277.1 276.0 281.1   1.5
       12.|-- if-ae-11-2.thar1.svq-sing  0.0%    10  272.0 273.0 271.6 278.5   2.0
       13.|-- 120.29.214.142             0.0%    10  276.2 277.2 275.8 280.5   1.7
       14.|-- 138.197.245.13             0.0%    10  272.2 272.7 272.1 274.6   0.8
       15.|-- ???                       100.0    10    0.0   0.0   0.0   0.0   0.0
       16.|-- 167.71.214.248             0.0%    10  272.9 272.8 272.0 273.6   0.6
      
      

      This could be also related with the fact that some routers are configured to discard ICMP (traceroute is using this protocol) and absent replies will be shown on the output as timeouts (???)

      However when doing mtr (My traceroute) on the ssh port - 22, I can see 20% packet loss on the droplet itself:

      mtr --tcp --port 22 --report --report-cycles 10 167.71.214.248
      Start: 2019-07-28T16:02:37+0300
                              Loss%   Snt   Last   Avg  Best  Wrst StDev
       16.|-- 138.197.245.11            30.0%    10  7587. 1327. 281.2 7587. 2760.5
       17.|-- 167.71.214.248            20.0%    10  7389. 2950. 266.7 7489. 3687.8
      

      There is also 30% packet loss on 138.197.245.11 which is an IP address associated with Digital Ocean. What you can do is to try to reach their support team and ask if they can have a look as well, because we're both experiencing the issue. They should be able to shed some light on this.

      Let me know if you have any questions.
      Alex

Hi community,

I contact support team for the help.

They reply that it may be due to Window's OpenSSH.

They said there are some layers which can introduce software firewalls or application blocking.

I tried MinGW's ssh, it seems that the SSH still get "timed out" sometimes, but much more stable than OpenSSH.

However, since my docker-machine cannot successfully run with external SSH, the problem still not be solved.

I'll keep updating this thread if there is any update.

Hope this could help others.

Have another answer? Share your knowledge.