New 2895 System Freezing Up Issues

Discussion in 'Tyan' started by J. Hinkey, Mar 19, 2007.

  1. J. Hinkey

    J. Hinkey Guest

    Gents -

    I've just put together a new system with all new parts. It's a compute node of a small cluster so it was meant to do nothing but crunch
    numbers. It consists of:

    2895 Board
    2 x Opteron 252s (E-stepping) w/OEM heatsink/fans
    8 x 512MB Corsair Reg. ECC PC3200
    Antec Neo HE 550W PS
    80 GB Seagate SATA
    Sony DVD-ROM
    Mitsumi Floppy
    EVGA GF 7300GS PCI-E Card

    Suse Linux 10.0

    The system went together without problems as did the OS install. I did a quick memory test after the install and it found no problems. But
    after about one day the system suddenly started freezing at random times and continued to do so. If I let it sit idle it would be fine. After
    one freeze up the HD was corrupted beyond repair and I had to do a re-install of the OS.

    After re-installing of the OS all seemed to be fine, but during it's first use as a compute node it froze up about 10 minutes into the
    computation. Tried is again and several minutes into the computation it froze up. I quickly checked the CPU temps in the BIOS hardware monitor
    section and they seemed fine.

    Then all heck started breaking loose. I made some seemingly minor adjustments in the BIOS and could not get it to boot - it just would spit
    some random characters to the screen and freeze. Had to re-set the BIOS to get it to boot. Then it started freezing up when I was in the BIOS.

    I ran another memory test, but this time I let it run for a while. All was going well when the software reported an "unexpected interrupt" and
    froze up. I've had bad memory modules in other boards before and never suffered this type of error when checking the memory or had the BIOS
    freeze up on me.

    BTW I've got another 2895 system running just fine with Suse 10.0 and the same BIOS revision

    Is this a bad MB or should I do things like:
    - re-install the CPUs
    - re-install the DIMMs (or swap out DIMMs to see if it makes any difference)
    - re-install the video board
    - swap PSUs

    Since it's crashing in the BIOS set-up I doubt it's a memory problem. Any suggestions before I call Tyan and try to work things out with them?

    Thanks -

    J. Hinkey, Mar 19, 2007
  2. Locking up in the bios, that's bad... I assume you installed the
    motherboard yourself; how did it go onto the standoffs? Any weird
    flexing or possibility of a fastener having been reefed down onto a
    trace? Something you might want to try, pain in the ass though it may
    be, take the cover off and loosen all the motherboard fasteners to make
    sure there is no flexing or intermittent shorts, then boot it
    up again and hope for better.

    If you were going to try reseating stuff, I'd try the CPU's first, then
    swap through the ram 1 gig at a time. Then the other things you
    mentioned. I'm assuming of course that there were no relevant log
    entries like kernel panics, etc...
    Chris Sorenson, Mar 27, 2007
  3. J. Hinkey

    J. Hinkey Guest

    Nope, motherboard installation went just fine.
    I ended up talking extensively with Tyan and doing all of my list above except swap PSUs (the Tyan tech guys did not think this was the problem)
    and even got another video card. Nothing helped. No combination of CPUs (swapped, single, etc.) or memory made any difference. I finally got
    a replacement motherboard.

    Upon re-installation I found at least one bad DIMM that did not pass memtest86 (never using Corsair memory again!) and after swaping that out
    it's been up and stable for a week now with 2GB installed and under near continuous 100% load doing heavy FP intensive stuff.

    Next test is to try to install the rest of the RAM (needs 4GB) and see how it goes.

    Has anyone else out there had problems with Corsair Reg ECC DIMMs (512MB sticks) with this motherboard?

    Thanks - John
    J. Hinkey, Mar 27, 2007
  4. J. Hinkey

    Bruce Burden Guest

    : Has anyone else out there had problems with Corsair Reg ECC DIMMs
    : (512MB sticks) with this motherboard?
    Not Corsair, but I do have 2GB (4x512) of Infineon that
    did not even let my get to a POST. Absolutely nothing happened,
    beyond the fans spun up. No beeps, no video. No nada.

    Some D32TB1GW memory from Super Talent solved the problem.
    After a board RMA, a PSU RMA, and finally purchasing a MSI
    board/cpu/memory that worked. Swapping parts around finally
    pinpointed the problem.

    Bruce Burden, Mar 28, 2007
  5. I've been using DDR PC-2100 266 MHz Corsair 512 MB sticks in my 2668
    for over three years with no problems...
    Chris Sorenson, Apr 2, 2007
