Jim Prescott wrote:
>
> A couple times a month the clock on one of our systems stops. Actually
> it gets stuck in a 3 second loop. Eg:
> Thu Jul 10 08:21:10 EDT 2003
> Thu Jul 10 08:21:08 EDT 2003
> Thu Jul 10 08:21:09 EDT 2003
> If we reset the time it then runs fine for a few weeks. Other that
> having the wrong time, the system seems to run fine. We normally keep
> time synchronized with a fairly old version of xntpd but as a test we
> turned it off for a few weeks and the problem still occurred. The system
> didn't lose track of time or its settings after being unplugged for ~20
> minutes so I suspect the the motherboard battery is fine.
I had this same problem last month with an U10/440Mhz, purchased 7/2000.
At the time, the system had an uptime of just over 400 days, so there
hadn't been any recent system changes. The clock did the same thing,
where the clock would get stuck in a 3 second loop. Also, while
watching the clock with a simple "date;sleep 1" loop, I would see the
clock make huge jumps in time (days, months and even years - both
backwards and forwards) right before getting stuck. Most of the time
the huge jump would only last for 1 second before jumping back to
regular time and getting stuck, but sometimes it would get stuck at the
time it jumped to. Resetting the system date would only fix the problem
for a short while (from a hour to only a couple minutes).
Turning off xntpd didn't change any thing, neither did rebooting the
system. On reboot, the "IDPROM contents invalid" message came up, so I
figured it was just a bad IDPROM. I made sure everything inside the
machine was physically good (RAM seated, drive and power connectors
tight, etc.). Just to make sure, I even tried swapping out the 1Gig
Kingston RAM in the machine with the original 128Meg Sun RAM that the
machine came with. Still the problem would occur within a few minutes
of booting up. Once or twice when I rebooted it, the system wouldn't
come up. The fans and hard drives would spin up, but the front panel
light wouldn't come on, even after waiting as long as 10+ minutes. At
the time, I just thought it was related to the bad IDPROM.
I ordered a new IDPROM chip from Mouser, which took a couple days to
deliver. As a temporary measure, I had a script that checked the system
time(), sleeping 5 seconds between checks. If the difference between
checks wasn't 5, I'd force an ntpdate against another system on the
local network. Not the prettiest solution, but it was very early Sunday
morning and I knew the new IDPROM wouldn't arrive until Tuesday.
When the new IDPROM arrived, I installed it and reprogrammed the hostid.
The clock seemed stable, and the system ran fine all that day. The next
morning the problem returned again, doing the same wild jumps and
getting stuck in a loop. I started going through the same things I had
tried before, but now after rebooting the system it wouldn't come up
again (fans and drives turning, but no light or activity). It would
take several attempts before finally booting. After another couple
system reboots, it refused to come up no matter how many times I tried
power cycling the system. I even tried it with the hard drives and CD-
ROM unplugged, just in case it might be the power supply.
That afternoon, we were able to get in touch with a company that
services Sun equipment, and we brought the machine down to them where
they replaced the motherboard. After swapping boards, the system booted
right up every time and the clock was stable. Nothing else was changed,
and no new patches were installed, and still with the original power
supply.
For the past month now, the system has been back up and running. I had
my script watching the system time, but there wasn't anything other than
the usual little bit of drift. I just reenabled xntpd a couple days ago
since everything is back to normal now.
As a footnote, the day after getting the system back, the tech who did
the replacement called to make sure everything was fine. He said he had
tried installing Solaris on the old motherboard (with RAM and drives
that he had there). It powered on at first, but locked up during the
install, and then wouldn't boot anymore after that once.
I've only found one reference to this type of problem happening to
anyone else:
http://google.com/groups?threadm=b6s....kreonet.re.kr
Just to toss out an idea, but I know a couple years ago some PC
motherboard makers had some problems with bad capacitors that lead to
systems crashing and not booting. When I had my system open I looked
the board over pretty closely and didn't see anything that looked
obvious (like warped or leaking caps). But maybe it's something
similar? It's pretty strange for 5 machines (your two, mine, and two in
the above thread) all running into the same odd clock looping problem,
especailly within a couple months of each other.
Ivan Richwalski