Question

DNS server from Digital Ocean not always reachable resulting in getaddrinfo failed: Temporary failure in name resolution

I have a standard droplet (Ubuntu 20.04) and a managed MySQL 8 database. Both are in the same VPC. On the droplet an application is connecting to the database via the hostname suitable to be used within the VPC: private-prod-customername-do-user-*******-0.b.db.ondigitalocean.com

The application reports infrequent errors about not able to resolve that VPC hostname, resulting in error message: SQLSTATE[HY000] [2002] php_network_getaddresses: getaddrinfo failed: Temporary failure in name resolution

Here are some timings to get an idea about the frequency. Most of the time, the duration of the issue is only a couple of seconds.

2022/05/08 19:32:14 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:15 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:19 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:19 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:21 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:25 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:25 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:30 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:31 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:34 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:35 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:36 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:37 getaddrinfo failed: Temporary failure in name resolution
2022/05/08 19:32:38 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 21:43:10 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 21:43:14 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 21:43:16 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 21:43:17 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 21:43:22 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 21:43:23 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 21:43:24 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 21:43:32 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 21:43:33 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 21:43:33 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 21:43:34 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 23:58:38 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 23:58:39 getaddrinfo failed: Temporary failure in name resolution
2022/05/09 23:58:42 getaddrinfo failed: Temporary failure in name resolution
2022/05/10 00:25:43 getaddrinfo failed: Temporary failure in name resolution
2022/05/10 00:25:43 getaddrinfo failed: Temporary failure in name resolution
2022/05/10 00:25:44 getaddrinfo failed: Temporary failure in name resolution
2022/05/10 00:25:44 getaddrinfo failed: Temporary failure in name resolution
2022/05/10 00:25:54 getaddrinfo failed: Temporary failure in name resolution
2022/05/10 00:25:55 getaddrinfo failed: Temporary failure in name resolution
2022/05/10 00:25:57 getaddrinfo failed: Temporary failure in name resolution
2022/05/10 00:26:05 getaddrinfo failed: Temporary failure in name resolution
2022/05/10 00:26:09 getaddrinfo failed: Temporary failure in name resolution
2022/05/10 00:26:11 getaddrinfo failed: Temporary failure in name resolution
2022/05/10 00:26:15 getaddrinfo failed: Temporary failure in name resolution

The file /etc/resolv.conf contains:

nameserver 127.0.0.53
options edns0 trust-ad

which has been untouched since creating the droplet.

So I made 2 scripts to investigate the issue. One script is endlessly pinging the database VPC hostname, writing the pong to a file with datetime in front of it.

ping private-prod-customername-do-user-*******-0.b.db.ondigitalocean.com | while read pong; do echo "$(date): $pong"; done > "/root/ping-result.log"

The other script is endlessly querying the DNS server to resolve that database VPC hostname. The interval is 0.5 seconds so I won’t miss a downtime spot.

while true; do
  sleep 0.5
  dig private-prod-customername-do-user-*******-0.b.db.ondigitalocean.com &>> ~/dig-$(date +"%Y%m%d-%H").log
done

Now I only had to wait for the following failure, which I now share with you. The latest message (from the log above - 2022/05/10 00:26:15) is taken as sample.

ping-result.log doesn’t show any problems. The database machine is ping-able, which was expected, since it doesn’t involve any DNS issue once the ping-loop is running. Only on initialization (starting the ping), the DNS is queried for its IP address. Below the time-span from within the error ‘Temporary failure in name resolution’ was recorded.

Tue May 10 00:26:10 CEST 2022: 64 bytes from 10.110.64.5 (10.110.64.5): icmp_seq=15891 ttl=64 time=0.636 ms
Tue May 10 00:26:11 CEST 2022: 64 bytes from 10.110.64.5 (10.110.64.5): icmp_seq=15892 ttl=64 time=0.660 ms
Tue May 10 00:26:12 CEST 2022: 64 bytes from 10.110.64.5 (10.110.64.5): icmp_seq=15893 ttl=64 time=0.455 ms
Tue May 10 00:26:13 CEST 2022: 64 bytes from 10.110.64.5 (10.110.64.5): icmp_seq=15894 ttl=64 time=0.632 ms
Tue May 10 00:26:14 CEST 2022: 64 bytes from 10.110.64.5 (10.110.64.5): icmp_seq=15895 ttl=64 time=0.622 ms
Tue May 10 00:26:15 CEST 2022: 64 bytes from 10.110.64.5 (10.110.64.5): icmp_seq=15896 ttl=64 time=0.645 ms
Tue May 10 00:26:16 CEST 2022: 64 bytes from 10.110.64.5 (10.110.64.5): icmp_seq=15897 ttl=64 time=0.549 ms
Tue May 10 00:26:17 CEST 2022: 64 bytes from 10.110.64.5 (10.110.64.5): icmp_seq=15898 ttl=64 time=0.630 ms

Moving on to the ‘dig’ DNS query log.

The dig-20220510-00.log shows the interesting piece. Now, i’m posting this in separate blocks of text, to make more clear what each ‘dig’ request output was.

; <<>> DiG 9.16.1-Ubuntu <<>> private-prod-customername-do-user-*******-0.b.db.ondigitalocean.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 51792
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;private-prod-customername-do-user-*******-0.b.db.ondigitalocean.com. IN A

;; ANSWER SECTION:
private-prod-customername-do-user-*******-0.b.db.ondigitalocean.com. 0 IN A 10.110.64.5

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Tue May 10 00:25:29 CEST 2022
;; MSG SIZE  rcvd: 116
; <<>> DiG 9.16.1-Ubuntu <<>> private-prod-customername-do-user-*******-0.b.db.ondigitalocean.com
;; global options: +cmd
;; connection timed out; no servers could be reached
; <<>> DiG 9.16.1-Ubuntu <<>> private-prod-customername-do-user-*******-0.b.db.ondigitalocean.com
;; global options: +cmd
;; connection timed out; no servers could be reached
; <<>> DiG 9.16.1-Ubuntu <<>> private-prod-customername-do-user-*******-0.b.db.ondigitalocean.com
;; global options: +cmd
;; connection timed out; no servers could be reached
; <<>> DiG 9.16.1-Ubuntu <<>> private-prod-customername-do-user-*******-0.b.db.ondigitalocean.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 27130
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;private-prod-customername-do-user-*******-0.b.db.ondigitalocean.com. IN A

;; ANSWER SECTION:
private-prod-customername-do-user-*******-0.b.db.ondigitalocean.com. 14 IN A 10.110.64.5

;; Query time: 0 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Tue May 10 00:26:16 CEST 2022
;; MSG SIZE  rcvd: 116

I included 2 positive ‘dig’ queries (the first and last one). In between 3 connections timed out. Now, as the ‘dig’ query is very fast, all successful queries are recorded with an interval of 0.5 seconds. Those 2 positive ‘dig’ queries may look like that the DNS server is unreachable for about 47 seconds (00:25:29 > 00:26:16).

Now, I would like to know, is this an issue that is only resolvable by DigitalOcean. Is there more to research? Is there more I can share with you, which may lead to the real root cause of the getaddrinfo failed: Temporary failure in name resolution errors?

In the meantime, I will set up another fresh droplet within that VPC and run the same scripts.

Based on the current research results, I think it’s safe to think that the DigitalOcean DNS services have network or stability issues which need to be resolved.

Subscribe
Share

Submit an answer
You can type!ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!