I upgraded one of my main servers to the Tyan S2882 and Opteron 246's a few months back. Since then, I have noticed semi-random system lockups running Red Hat Enterprise Linux WS release 3 (Taroon Update 3) with the 2.4.21-20.ELsmp kernel. The lockups seem to occur around the time of high file system IO ie) backups / tar's / etc. The symptoms are -- I can ping the machine, I can even portscan the machine and see the active tcp ports -- but, I can't ssh (or anything else into the machine) and the console is completely locked. The machine "was" running an Adaptec SCSI PCI-X card out the back until I thought the problem to be that... since then I have removed the card and replaced the RAID which it ran with two internal SATA 400 gig drives (sata_sil). The lockups still happen with high IO on the filesystem. Strange thing is, I get NOTHING in syslog!!! Nothing at all. In fact, when I hard-power-cycle the machine when hung, after the machine is up and I run `last`, I see the reboot I did. Its like the machine is in some funky psuedo up state... I've tried new memory with no success either. I'm pulling my hair out at this point trying to figure out my next step. Any help or ideas would be great.
Try a current kernel. Especially SATA support is pretty sparse in its error reporting in older kernels. 2.4.21 is almost historic by now. If you are lucky the problem goes away. if you are less lucky you at least get an error message. Arno
At one point when the SCSI was still in the machine I did try something in the 2.6 kernel range with the same lock ups. I will try again with SATA. Any version recommendations? Thanks