1. This forum section is a read-only archive which contains old newsgroup posts. If you wish to post a query, please do so in one of our main forum sections (here). This way you will get a faster, better response from the members on Motherboard Point.

Transport errors on SunBlade 2000

Discussion in 'Sun Hardware' started by rvandolson, Dec 7, 2006.

  1. rvandolson

    rvandolson Guest

    I have a remote (couple hundred miles away) SunBlade 2000 server with
    two Sun 72GB FC-AL drives in it (internal). The machine has been
    acting strange lately with IO issues and when rebooted the server needs
    an fsck to be run in order to come back up again.

    Excerpts from the logs:

    Dec 5 18:09:01 rococo scsi: WARNING:
    /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w2100000c50567b30,0 (ssd1):
    Dec 5 18:09:01 rococo SCSI transport failed: reason 'timeout':
    retrying command
    Dec 5 18:11:06 rococo scsi: WARNING:
    /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w2100000c507acf7e,0 (ssd0):
    Dec 5 18:11:06 rococo SCSI transport failed: reason 'timeout':
    retrying command

    Dec 5 20:04:21 rococo scsi: Unexpected SCSI status received: 0x4
    Dec 5 20:04:21 rococo last message repeated 1 time
    Dec 5 20:04:22 rococo scsi: WARNING:
    /pci@8,600000/SUNW,qlc@4/fp@0,0/ssd@w2100000c507acf7e,0 (ssd0):
    Dec 5 20:04:22 rococo transport rejected fatal error

    The problem seems to reoccur from time to time and is generally
    accompanied by user space issues such as I/O errors when doing an ls on
    a filesystem within the drive.

    bash-2.05# iostat -En
    c0t6d0 Soft Errors: 1 Hard Errors: 0 Transport Errors: 0
    Vendor: TOSHIBA Product: DVD-ROM SD-M1711 Revision: 1005 Serial No:
    Size: 0.00GB <0 bytes>
    Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
    Illegal Request: 1 Predictive Failure Analysis: 0
    c1t2d0 Soft Errors: 0 Hard Errors: 2 Transport Errors: 295
    Vendor: SEAGATE Product: ST373307FSUN72G Revision: 0307 Serial No:
    0345B6DQDX
    Size: 73.40GB <73400057856 bytes>
    Media Error: 0 Device Not Ready: 0 No Device: 1 Recoverable: 0
    Illegal Request: 0 Predictive Failure Analysis: 0
    c1t1d0 Soft Errors: 0 Hard Errors: 53 Transport Errors: 121
    Vendor: SEAGATE Product: ST373307FSUN72G Revision: 0307 Serial No:
    0334B1S0EL
    Size: 73.40GB <73400057856 bytes>
    Media Error: 0 Device Not Ready: 0 No Device: 4 Recoverable: 0
    Illegal Request: 0 Predictive Failure Analysis: 0

    So you can see these are both ST373307FSUN72G drives with firmware
    revision 0307 on them. There are a bunch of Transport Errors and a
    couple Hard Errors.

    The defects list is at 0 for both drives... leading me to think the
    drives themselves are not going bad. Besides, seems unlikely that both
    would have problems at the same time.

    The OS is Solaris 9. I can provide output of showrev -p if requested.

    Potential causes for this that I can think of:

    1. Drives not seated quite properly. Trying to determine if this issue
    only occurs under high I/O load and have someone check the drives on
    site.
    2. Termination issue.
    3. Drive firmware needs upgrade (anything newer than 0307 for this
    drive?)
    4. Heat
    5. OS needs patch.
    6. Bad cables
    7. Bad drives.

    Am I missing anything? Anyone run across this type of issue before and
    have some advice to lend?

    Thanks,
    Ray
     
    rvandolson, Dec 7, 2006
    #1
    1. Advertisements

  2. rvandolson

    Trinean Guest

    Potential causes for this that I can think of:
    To answer #3 the latest firmware is 0407 is the latest I can find.

    Trinean
     
    Trinean, Dec 7, 2006
    #2
    1. Advertisements

  3. rvandolson

    rvandolson Guest

    Got Sun to take a look at this. Turns out it's a bad GBIC (FC-AL) and
    the mainboard will need to be replaced.

    Ray
     
    rvandolson, Dec 8, 2006
    #3
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.