S5397 - i5400pw - BIOS 1.03 - Ubuntu system freeze

Discussion in 'Tyan' started by Mark Jeynes, Mar 15, 2009.

  1. Mark Jeynes

    Mark Jeynes Guest

    I run a system with this mobo and Ubuntu 8.04 server 64-bit. Recently
    the system has been hanging after about 20-30 mins, sometimes reporting
    ata disk errors.

    If I reboot the board boots as far as "Mouse Initialized" then hangs for
    3 mins or so. After this it reports error with disk 1.

    Poweriing down for 3-4 mins and rebooting, the system will boot
    successfully again. Only it hangs 20 mins or so later and we're back
    into the same loop. The hang seems to occur when there is substantial
    SATA activity. After boot my 3 SATA disks start a RAID resync which
    says will take 240 mins to complete - all disk activity stops 15 mins or
    so later and the system is frozen. Sabe result on a variety of recent
    kernels.

    The same condition occurs if I do not start the RAID software and run
    concurrent but independent disk integrity checking software on the SATA
    disks (i.e. I'm using the 'badblocks' utility under Linux).

    Right now I'm unsure if my issue lies with the disks or the mobo; it's
    hard to isolate the problem when you are dependent on the mobo to test
    the disks and vice versa!

    any words of wisdom from you clever people will be gratefully received!
     
    Mark Jeynes, Mar 15, 2009
    #1
    1. Advertisements

  2. Mark Jeynes

    Paul Guest

    One thing I've noticed here, as a home user, is that if a SATA disk
    has a problem, there doesn't appear to be a mechanism to reset the
    disk interface from the motherboard. When something similar happened
    to me, I had to power cycle, before the hard drive was reset and
    could be seen again.

    The fact that a reboot after a failure in your case, results in an
    "error with disk1", which is cleared by powering down, suggests the
    disk is the part that is hung up, rather than the motherboard.
    The chipset should be resettable, on the reboot, so I wouldn't expect
    it to stay in a stuck state.

    Have you tried downloading the disk diagnostic from the disk
    manufacturer website ?

    Is there a chance the disk(s) are overheating ?

    Does the power supply have enough 12V amps for all
    the loads you have connected ?

    You could also try testing the disks as simple data disks on
    another computer. You could use something like the free version
    of HDTune for Windows, as a test stimulus for the drives (i.e. no need
    for the OS to see a file system on the drive, to test it). HDTune has
    a read benchmark, that reads the disk surface, and also has an error scan.
    It also reads drive temperature via SMART (that is, as long as
    the port the disk is connected to, can issue SMART commands).

    http://www.hdtune.com/download.html

    Paul
     
    Paul, Mar 16, 2009
    #2
    1. Advertisements

  3. Mark Jeynes

    Mark Jeynes Guest

    Firstly I'd like to say a hearty thankyou Paul. Just seeing a reply
    this morning made me feel I'm not alone on this planet. cheers mate.
    I did today ... on your advice (thankyou! I'd not considered they would
    offer such a thing). The tool says my disks are fine :) (phew)
    Possibly ... and I know they have in the past (smartctl told me). I did
    have them stacked in one of those 5-in-3 backplane caddies. Probably
    not a good idea when 3 neighbouring RAID disks decide to do a total
    resync. There's not much room in there for airflow, so this kind of
    need means things will get steamy - even though it's backed with a fan
    that could suck a golfball through six feet of hose.
    I should say so ... it's a nice 750W supply from Silverstone.
    Hopelessly overdone but you know how gadget-lust takes over when
    shopping for machine parts.
    Wow. I'm humbled by your knowledge of this topic and very, very
    grateful (there you go, I said I would be).

    I believe my main problem is my desire to silence the machine as far as
    practical so after turning off most casefans it was getting a bit hot in
    there. Though I can't back this with hardcore science, I've discovered
    that re-enabling a couple of case fans today gave me several hours of
    uptime, enough to complete the resync. This brought my recovery task to
    critical mass - resync done means disk activity mainly stops and the
    problem cause of getting hot subsides. That's my theory - but the
    beautiful truth is it's still going now. Now I can turn to a
    preventative course of action rather than desperate recovery task.

    thanks again
     
    Mark Jeynes, Mar 17, 2009
    #3
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.