How can I determine the root cause of an unstable clock source?

February 8, 2016 1.5k views
Server Optimization Applications CentOS

We noticed one of our droplets reported the following line in the messages log:

kernel: TSC appears to be running slowly. Marking it as unstable

We have many droplets running Asterisk 1.4 on CentOS 5.9 with the 2.6.18-308.1.1.el5 Linux kernel. Only one of them appears to be having a significant issue with keeping time.

DAHDI is logging timeshifts to the messages log at an unusually high rate. Here's a sampling of what we see in the messages log:

I came across this blog article on "Fixing unstable clocksource in virtualised CentOS", but before I try implementing a clocksource failover, I thought I would ask for thoughts from the community here. Why do you think the clocksource on this droplet is slow?

Could a slow clocksource also result in incorrect load averages? This droplet has four CPU cores. The reported load averages are higher than other similar systems. We've seen the load averages briefly go above 4.0, but I've been unable to determine why. When I see the load average at 4.0 or above, I only see maybe 20% total CPU usage (5% User plus 15% System, as reported by New Relic Server, see screenshot).

Be the first one to answer this question.