On 2009-09-17, ChrisQ <> wrote:
> DoN. Nichols wrote:
>> On 2009-09-12, ChrisQ <> wrote:
>>> Hi,
>>>
>>> Am running 2 x 5 fc drives in raid 5 config, one drive goes down
>>> (maintenance mode) and needs to be replaced.
>>>
>>> Am pretty sure that on an earlier version of Sol10, I could remove the
>>> drive, insert another newfs'd and fsck'd drive in the same slot, reboot
>>> -r to see the drive, metadb the new drive and do:
>>>
>>> metareplace d0 c8t4d0s2 c8t4d0s2 for example
>>>
>>> This worked and have it documented in a logbook, but with the latest
>>> sol10 version, doing the same reports an error: old and new drives are
>>> the same.
>>
>> O.K. First -- were you using FC drives previously?
>>
>
> Don,
>
> Thanks for the detailed reply. Have been using the same 10 disk fc box
> for a couple of years now. It's an old emc box with all the drives 73 gb
> hh and s/hand.
O.K. So that did not change.
> I bought a whole bunch of drives and initially used a pc
> running the seagate disk tools to read the logs.
You can read the logs with smartctl, and using smartd, be given
e-mail notices any time the bad sector list grows. It does not come on
Solaris 10, but you can download the source and compile it.
================================================== ====================
== HOME ==
The home for smartmontools is located at:
http://smartmontools.sourceforge.net/
================================================== ====================
And the version which I am running is 5.38, which was the latest version
earlier this summer.
Here is an example of the outout of smarctl -a /dev/rdsk/c1t14d0s2
================================================== ====================
Katana:csu 21:37:26 # smartctl -a /dev/rdsk/c1t14d0s2
smartctl version 5.38 [sparc-sun-solaris2.10] Copyright (C) 2002-8 Bruce Allen
Home page is
http://smartmontools.sourceforge.net/
Device: SEAGATE SX3146807FC Version: D010
Serial number: 3HY0TDL4 7343QSVU
Device type: disk
Transport protocol: Fibre channel (FCP-2)
Local Time is: Thu Sep 17 21:38:20 2009 EDT
Device supports SMART and is Enabled
Temperature Warning Enabled
SMART Health Status: OK
Elements in grown defect list: 0
Vendor (Seagate) cache information
Blocks sent to initiator = 6914983988329
Vendor (Seagate/Hitachi) factory information
number of hours powered up = 39773.07
number of minutes until next internal SMART test = 104
Error counter log:
Errors Corrected by Total Correction Gigabytes Total
ECC rereads/ errors algorithm processed uncorrected
fast | delayed rewrites corrected invocations [10^9 bytes] errors
read: 20328350 3 0 20328353 20328353 61762.251 0
write: 0 0 2 430457 430840 10032.625 0
verify: 247574 0 0 247574 247574 826.201 0
Non-medium error count: 61440
SMART Self-test log
Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ]
Description number (hours)
# 1 Background short Completed - 37504 - [- - -]
# 2 Background short Completed - 37338 - [- - -]
# 3 Background short Completed - 37172 - [- - -]
# 4 Background short Completed - 37006 - [- - -]
# 5 Background short Completed - 36840 - [- - -]
# 6 Background short Completed - 36674 - [- - -]
# 7 Background short Completed - 36508 - [- - -]
# 8 Background short Completed - 36347 - [- - -]
# 9 Background short Completed - 36181 - [- - -]
#10 Background short Completed - 36015 - [- - -]
#11 Background short Completed - 35849 - [- - -]
#12 Background short Completed - 35683 - [- - -]
#13 Background short Completed - 35517 - [- - -]
#14 Background short Completed - 35351 - [- - -]
#15 Background short Completed - 35185 - [- - -]
#16 Background short Completed - 35020 - [- - -]
#17 Background short Completed - 34854 - [- - -]
#18 Background short Completed - 34688 - [- - -]
#19 Background short Completed - 34522 - [- - -]
#20 Background short Completed - 34356 - [- - -]
================================================== ====================
So -- 39,773+ total hours on that drive -- which happens to have
a ufs filesystem instead of being part of a ZFS array. I don't have
enough 146 GB drives to do things that way yet. :-)
IIRC, I used the Sun development set as part of the latest
Solaris 10 set of DVDs (Studio 12, IIRC).
To get the automatic notification, you will want to edit
/etc/smartd.conf to specify the disks which you want to monitor and the
level and frequency of testing.
> Most of the drives were
> only 10's or a few hundred hours at most. This is the second drive that
> failed, though it may just need reformatting.
Get smartmontools, compile them, and see what they tell you about
that drive (ideally in another slot, if you have a spare slot
somewhere).
> The system is still on ufs as haven't had the time to get into zfs as
> yet and want to completely understand the implications before updating.
Understood. I played with the latest version on both a Sun
Fire 280R, and a Sun Blade 1000 before I reinstalled on my Sun Blade
2000. (With a zfs raid array, export it first, and then import it to
the new OS.) The new OS is installed on a pair of 73 GB FC-AL internal
disks, while the old one was installed on a 36 GB FC-AL, with overflow
on a 146 GB FC-AL.
Some of these days I'll take down my main server and do the
same, but not yet. I really need a time when both my wife and I won't
need to access the world for quite a few hours. :-)
> Not to mention the time involved. So, its either the management console
> or meta* commands. I'm pretty sure that I could do a metareplace into
> the same slot last time, so stumped as to why it didn't work this time...
Do you remember whether you did a devfsadm before the
metareplace last time when it worked? That may be what is needed --
though if you did a reconfigure boot that should accomplish the same
thing.
And I think that I mentioned before -- zfs at least does not
care what slots the drives from its arrays are in -- it finds them and
uses them wherever that happens to be. (But be careful when you replace
to not accidentally expand the raid array to one of the hot spares
instead of just letting it use the hot spare on its own until you do the
"replace" with the new drive. I accidentally did that once, and there
was no way to back off using that extra drive other than backup of the
data to tape and rebuild the array. You can tell it to let go of a
drive used in a mirror, but not to back down from a 6-drive RAID array
to the desired 5-drive one. :-)
Good Luck,
DoN.
--
Email: <> | Voice (all times): (703) 938-4564
(too) near Washington D.C. |
http://www.d-and-d.com/dnichols/DoN.html
--- Black Holes are where God is dividing by zero ---