How can I determine the root cause of an unstable clock source?
We noticed one of our droplets reported the following line in the messages log:
kernel: TSC appears to be running slowly. Marking it as unstable
We have many droplets running Asterisk 1.4 on CentOS 5.9 with the 2.6.18-308.1.1.el5 Linux kernel. Only one of them appears to be having a significant issue with keeping time.
I came across this blog article on "Fixing unstable clocksource in virtualised CentOS", but before I try implementing a clocksource failover, I thought I would ask for thoughts from the community here. Why do you think the clocksource on this droplet is slow?
Could a slow clocksource also result in incorrect load averages? This droplet has four CPU cores. The reported load averages are higher than other similar systems. We've seen the load averages briefly go above 4.0, but I've been unable to determine why. When I see the load average at 4.0 or above, I only see maybe 20% total CPU usage (5% User plus 15% System, as reported by New Relic Server, see screenshot).