Gigabyte GA-EP45-DS3R (v1.0) RAID-1 (mirrored) failed.

Discussion in 'Gigabyte' started by Dave Pollum, Aug 19, 2009.

  1. Dave Pollum

    Dave Pollum Guest

    GA-EP45-DS3R v1.0 mobo. Had 2x Seagate Barracuda SATA, 7200rpm
    1,000GB (1TB). I built a RAID-1 (mirroring) system using the RAID
    BIOS, because my PC _used_ to use a single non-RAID Seagate SATA
    drive, and that drive failed. So I figured that I was safe with
    mirroring. When I booted up this morning, the BIOS showed that one of
    the RAID drives was bad and that the RAID volume had "failed". I
    heard clicking maybe 4 times yesterday afternoon, but there was no
    notice that a drive was failing.
    I disconnected the bad drive, and tried to boot - BIOS can NOT find
    the boot loader on the good drive. Thinking that when a RAID-1 drive
    goes bad all I have to do is replace it, so I bought a new Western
    Digital 1TB drive (I'm done with Seagates), and plugged it in place of
    the bad Seagate. Still the PC won't boot. If I want to create
    another RAID volume, it looks like the BIOS wants to erase my HD!!
    And it will do the same if I change from RAID to non-RAID. I've got
    LOTS of important stuff on the RAID drives and I can't wipe the drives
    and start over.

    What do I do???

    BTW -I'm on an old PC, that uses IDE, not SATA, drives
    -Dave Pollum
     
    Dave Pollum, Aug 19, 2009
    #1
    1. Advertisements

  2. Dave Pollum

    Paul Guest

    Boot with another OS and do your forensics on it. I use
    the Linux LiveCD Knoppix from knopper.net, but there are
    probably others you could use. Ubuntu from ubuntu.com even.

    First check, would be to see if there are any partitions.

    Then try and mount them in Linux. Linux can now handle
    both FAT32 and NTFS, as well as EXT2 or EXT3 native formats.
    You can even mount Macintosh disks if you need to (but only
    to copy data off - a Mac volume isn't sitting on the desktop).

    The TestDisk program can be used to scan a disk for partitions,
    and rebuild a partition table to match. If you've recently
    "deleted" partitions, it may find them. You have to be
    careful to check the results, before accepting them and
    updating the partition table. A copy of this may be on
    the Linux boot CD already. Versions are available for
    Windows as well. The main benefit to your situation, would
    be to see if a partition can be recognized. It takes a while
    to read the entire disk. (If you need to abort at any
    menu level, press <control>-<C> to quit. The last time I used
    it, not all menu levels had a quit command.)

    http://www.cgsecurity.org/wiki/TestDisk_Step_By_Step

    Once you understand what is on the "good" disk, maybe then
    you'll be able to form a plan.

    If you make absolutely no progress via Linux, you can slave
    the "good" drive to another Windows PC. These are examples
    of scavenger programs, that might get some of your files
    back, by searching for fragments. This approach is reserved
    for times when the file system is toast, and you're trying
    to get what you can from the media.

    http://www.cgsecurity.org/wiki/PhotoRec

    This copy is a backup version of a web site that has since
    disappeared. This software was eventually sold by the author,
    to another company as the basis of a commercial offering.
    At least one person has been able to get back files from
    an NTFS partition with it. Any number of $39.95 data recovery
    programs may do the same thing (some teasing you with a list
    of file names, before asking for the money).

    http://www.pricelesswarehome.org/WoundedMoon/win32/driverescue19d.html

    My guess would be, the array was in a "degraded" state from
    initial installation (one drive effectively failed or not
    mirroring at all), and the array was not rebuilt for some
    reason. Mirrors have two bad states. "Degraded" when one disk fails.
    "Failed" when two disks fail.

    One thing about RAID is, it should be thoroughly tested, before
    you use it for anything. Build your RAID 1 mirror. Put some
    test files on it. Disconnect one drive. Does the second drive
    work ? Now, security erase the second drive on another
    computer (so you can pretend it is a new replacement drive).
    Plug it in. Does the rebuild work properly ? Disconnect the
    other drive. Does the mirror work properly with just
    the "rejuvenated" drive ? Basically go through a sequence to
    prove the data is being transferred successfully in all cases,
    so you can trust the thing to work right on a failure. Basically,
    you become comfortable with the maintenance procedure as well,
    as there is no valuable data on the thing while you're testing.

    I've heard of this happening on a SIL3112 set up in RAID 1 mode.
    The user had a "degrade" one day, leaving the second disk as the
    only source of data. The data turned out to be three months
    stale, and all updates were lost for the last three months. Which
    means that SIL3112 based system, stopped mirroring for at least
    three months. The user appeared to be savvy enough, to know about
    array status, and claims there was no warning that the mirror was
    not working. So while he didn't lose all the data, effectively
    its the same as your case. The mirror wasn't working. And the
    software didn't say anything.

    It occurred to me at the time, that these mirrors really need a
    means of doing a verification some how, offline. Like, if the
    BIOS had a "check my mirror" command, you could leave it
    running overnight, verifying whether the sectors on both
    drives matched. But instead, the designers of this crap,
    have decided that a "rebuild" is how you assure yourself they're
    equal, which is not quite the same thing. I'd really want a
    means of checking for divergence, as a way of seeing how *well*
    the mirror is working. (If they don't match, you'd know there is
    a lack of functionality.)

    If the problem on your remaining disk was a corrupted MBR, then
    you could use "fixmbr" in a Windows environment. This article
    claims the command accepts a disk argument, so you should be
    able to fix more than one disk with it.

    http://pcsupport.about.com/od/termsf/p/fixmbr.htm

    Paul
     
    Paul, Aug 19, 2009
    #2
    1. Advertisements

  3. Dave Pollum

    Dave Pollum Guest

    Paul;

    UGH...lots of work.
    When booting my PC, I occasionally saw "rebuild" for the RAID's
    status. I assumed this meant things were working OK. Unfortunately,
    the EP45-DS3R manual (including the latest one on GA's website), only
    mentions how to set up their on-board RAID, not what to do if
    something goes wrong. So, I'm quite frustrated - I can't anything
    done (I freelance) until I get this %^&*#$ mess fixed.
    Would a separate RAID card be a better choice than the on-board RAID
    controller, and what would be a good choice?
    -Dave Pollum
     
    Dave Pollum, Aug 19, 2009
    #3
  4. Dave Pollum

    Paul Guest

    Well, you're at the point of doing DIY data recovery, so
    yes, there is work and a learning curve. At this point,
    it is hard to tell how much trouble you're in. Maybe the only
    good copy of the data, is on the failed disk ?

    Some motherboard makers, include a copy of the Matrix Storage
    manual, on the motherboard CD. Sometimes you have to "explore"
    the CD, to find things like this. I have a few CDs here, where
    there is a "manuals" folder. This would be an example of an
    Intel manual. (I believe one of the older versions, had a
    pretty nice fault recovery section. Sometimes, when they
    rewrite these manuals, they throw out the good stuff and
    just leave the marketing fluff.)

    http://download.intel.com/support/chipsets/imsm/sb/manual70.pdf

    "Degraded" and "failed" cases are described here, on web pages. This
    would be the main page for Matrix RAID at Intel.

    http://www.intel.com/support/chipsets/imsm/

    And there is a newer manual than the first link. I haven't flipped
    through the pages of this one yet.

    http://download.intel.com/support/chipsets/imsm/sb/8_x_raid_ahci_users_manual.pdf

    *******

    I think maybe a little philosophy is in order, before you decide what
    to do next.

    The purpose of RAID, is to allow maintenance on a computer to be
    put off until a more convenient time. As an example, say I have a
    company with 100 employees, and one volume of the RAID 1 dies at
    2PM in the afternoon. Before the advent of RAID, the company would
    be at a standstill for three hours, until the disk is restored from
    tape and the server can be put online again.

    If that server had a RAID1, the array status goes to "degraded", and
    the remaining good disk captures all the transactions. The IT staff can
    wait until closing time, to install a replacement disk, and start
    the rebuild, to take the array from "degraded" to "fully operational"
    again. If a rebuild is done in the middle of the day, sometimes it
    sucks all the performance out of the server (I've experienced that
    at work - some rebuilds actually have a control that sets how much
    bandwidth is used for the rebuild).

    So the main purpose of RAID, is as a delaying tactic. So that on a
    long weekend, the company doesn't have to pay time and a half, for
    someone to come in and fix the equipment. It means maintenance can
    be scheduled, rather than being an instant disaster.

    So if that is the case, what is the missing element ? It is backups.
    Why would people with RAID equipment need backups ? Because *all*
    the equipment can fail in one shot.

    We had a couple complete failures in prime time at work. A *hardware*
    RAID controller, one of those expensive boards with its own processor,
    decided to write zeros over a critical portion of a RAID 5. Since all
    volumes were damaged at the same time, by that firmware bug, the
    staff had no choice but to restore from tape at 2PM in the afternoon.
    Since that server carried network licensed software, hundreds of staff
    could not work. My estimate, is the outage cost the company hundreds
    of thousands, as at least a few of the employees in question, would
    just walk out of the building. They wouldn't even stick around to
    see if the server would come back online.

    There are a few failure mechanisms, where *no* RAID is safe enough.
    For example, on your desktop computer right now, if the power supply
    12V rail decides to output +15V for the next 30 seconds, *all* the
    disk drive motors would burn. Now, none of the disks on the computer
    work any more. How would a person protect themselves against
    such a failure ? Backups. (I actually got confirmation from one poster,
    that he had in fact experienced the overvoltage case, and had drives
    ruined by the power supply. So it does happen.)

    Then the question becomes, "so I gotta do backups, then what good
    is the RAID 1?". OK, what the RAID1 is buying you, is some degree
    of data redundancy. It is like an "extra backup", which most of
    the time, is going to be there for you. So the RAID1 can still
    have some value. But it should not be relied upon, to "cure cancer".
    It should not be your only asset.

    So, if you're going to use RAID...

    1) Learn to use it. It doesn't matter if it is chipset RAID or
    hardware RAID. Like a fire drill, make a "pretend emergency"
    and repair it. That might even mean, having a spare disk handy,
    just like you'd need in a real emergency, and so on. Consider
    all the scenarios that should be tested. For the RAID1, that means
    simulating a drive failure, rebuilding to a new disk, pulling the
    old disk and verifying the data is still there on the rejuvenated
    disk. A RAID5 would have its own set of test cases and "fire drills".

    2) Continue to do backups. If the computer is hit by lightning, is burned,
    overvolts on +12V, is hit by a virus and all files are erased, your
    "offline USB external" is what is going to get you up and running
    again. You also have more than one USB external, because some of
    those products are unreliable (check Newegg reviews for some examples).

    In terms of hardware RAID, the forums at 2cpu.com might have some
    discussions, or perhaps over on storagereview.com there might be
    something. I don't know how busy storagereview is these days. At
    least some people on 2cpu.com have tested a few different RAID
    controller boards. So you might see comments about Areca or
    LSI Logic and so on.

    You can also check reviews on Newegg, for the RAID cards, and maybe
    get some ideas there.

    Real RAID card testing takes months. So for someone reviewing a hardware
    card, it takes a while to test enough emergency conditions and the
    like, to develop an opinion about the product.

    To give an example, I've heard of the odd case, where a person installs
    a RAID5 with three disks. One disk fails, which in theory drops the
    array to "degraded". The array also happens to contain the boot
    partition. A few people have experienced a "failure" instead, where
    even though there are enough drives for the data to be available,
    the computer cannot boot. For some reason, if the array has four drives,
    the same problem doesn't seem to happen. You can imagine, it would take
    a lot of test cases, to get some feeling for how many quirks any product
    might have.

    HTH,
    Paul
     
    Paul, Aug 20, 2009
    #4
  5. Dave Pollum

    Dave Pollum Guest

    Paul;

    Thanks for sharing your knowledge and experience, and pointers to web
    sites. I really do appreciate it. If I sounded ungrateful it's
    because I'm really stressed out over this. Right now I've disabled
    mobo RAID and I'm running Seagate's SeaTools to make sure that the
    drives are actually readable. So hopefully I'll be able to get the
    data off of the drives.
    -Dave Pollum
     
    Dave Pollum, Aug 20, 2009
    #5
  6. I don't know the direct answer to your question that you are looking
    for, e.g. how to directly recover and boot from what you thought was a
    mirror of your hard drive, to get you instantly back to where you were.

    But some comments:

    WHAT I WOULD DO NOW:

    Connect the remaining supposedly good drive to another computer (either
    directly via SATA cable, or via USB to SATA converter (these are cheap,
    under $20). DO NOT TRY TO BOOT FROM THAT DRIVE, or do anything else
    what would write to it. See if you can access that drive. If so, copy
    your "stuff" from it to the other computer. If not .... there is a
    chance that it's truly gone forever.

    COMMENTS FOR THE FUTURE:

    I'm going to get a lot of flack on this, but in general, individual home
    users should forget about RAID. Period. Just forget about it. It's
    not worth the trouble and the hassle. RAID is for enterprise class
    servers using dedicated high-end RAID cards. It's not for desktop
    systems (ok, now you can flame me on this one).

    This is probably not news to you, but the importance of backups simply
    CAN NOT be overstated. Your problem is that you thought that what you
    were doing WAS an adequate backup. WRONG. For several reasons.

    First, there MUST be a backup EXTERNAL to the system being backed up.
    Obviously, even if it otherwise worked flawlessly, RAID is never a
    complete answer for this reason alone. Why? Because there are too many
    things (including, but not limited to both theft and power supply
    failures) that can result in the loss of everything within the physical
    computer box.

    Second, there must be a backup at a different physical location. Theft
    and fires happen. Not often, but they happen.

    Third (does not apply to you, but ....) Flash drives are not a safe
    backup media. Neither is any form of REWRITABLE optical media. Flash
    and rewritable media can lose their data for no apparent reason. It
    happens. I service computers and I teach computing at a local college.
    Believe me, data loss from these types of media happens.

    What I recommend:

    The primary drive(s) in the computer is a good quality SATA drive (my
    current recommendation is the 1TB Western Digital "Black" series of drive).

    DATA DOES NOT GO ON DRIVE C: !!! Drive C: is a relatively small (under
    100GB) primary partition of the primary physical hard drive, but that
    drive has other partition(s) for data. [In my case, XP is on C:,
    Windows 7 is on D:, data, which is common to both OS', is on E: and F:]

    This is important; it means moving your "My Documents" folder to another
    drive, and configuring all of your programs to not store their data on
    drive C:. The only thing on drive C: is the operating system and
    installed PROGRAMS. Setting up your system in this manner greatly
    facilitates backup and recovery.

    The computer contains secondary drive(s) which are identical or similar
    hardware to the primary drive, used for ***MANUAL*** backup (e.g. drag
    and drop copying). It goes without saying that you have to remember to
    do this, periodically.

    In addition, I have an external USB hard drive that I also use as a
    backup, also manual, by copying critical files to it periodically.

    Finally, monthly, I make backups of my critical files to ONE-TIME
    optical media (I currently use dual layer DVD, I trust only Verbatim
    brand for dual-layer media). For the most part this contains changes
    and new files only. These are stored off-site (bank safe deposit box).
    About once a year I make a complete backup of pretty much everything.

    To facilitate recover of the OS or the hard drive itself, I have an
    "image backup" of Drive C: This is stored on the secondary hard drive
    within the computer, on the external hard drive, and it is also burned
    to DVD media stored off-site.

    I have found that this protocol provides a high level of safety from
    pretty much all of the bad things that could happen, and it's what I
    recommend to my students.
     
    Barry Watzman, Aug 20, 2009
    #6
  7. Per Barry Watzman:
    What is your take on automating this process via something like
    SecondCopy?
     
    (PeteCresswell), Aug 25, 2009
    #7
  8. I'm not a fan of it, but if you like it, fine. The important thing is
    to have a backup.
     
    Barry Watzman, Aug 27, 2009
    #8
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.