1. This forum section is a read-only archive which contains old newsgroup posts. If you wish to post a query, please do so in one of our main forum sections (here). This way you will get a faster, better response from the members on Motherboard Point.

Sun Enterprise 420R crashed

Discussion in 'Sun Hardware' started by Rob Lancia, Jan 8, 2004.

  1. Rob Lancia

    Rob Lancia Guest

    Our primary production Oracle server crashed twice over the holidays
    but each time came back up on its own. Before I could get in and
    devote some time to looking into this, it crashed again but this time
    did not restart. I've rolled Oracle over to the backup server to buy
    myself some time but I am seeking some assistance for the next
    step(s).

    ["Why ignore the 1st two times, and why not call Sun?" you may be
    asking. Well: (1) my company is winding down thru bankrupcy, (2) the
    former and formidable IT department is now just me, (3) we have no
    support contracts, and (4) our current load is very small with a
    limited future.]

    The console recorded this as it was going down:

    --------------------------------------------
    Fatal Reset
    Psycho UE AFSR 1418bb800000
    BLK:1 UPA MID:1b DW_OFFSET:5 BYTEMASK:1418
    S_DWR:0 S_DRD:0 S_PIO:0 P_DWR:0 P_DRD:0 P_PIO:0

    Psycho CE AFSR 9333184000000
    BLK:0 UPA MID:4 DW_OFFSET:4 BYTEMASK:3331
    E_SYND:9 S_DWR:0 S_DRD:0 S_PIO:0 P_DWR:0 P_DRD:0 P_PIO:0

    SC Control 2000000
    EWP:0 IAP:0 FATAL:1 WAKEUP:0 BXIR:0 BPOR:0 SXIR:0 SPOR:0 POR:0

    P0 Status 1e00000
    XNOF:0 C1OF:0 C0OF:0 MC1Q:6 MC0Q:3 MC1OF:0 MC0OF:0 IPRTY:0 IPORT:0
    IADDR:0 FATAL:0

    P1 Status 49e20000
    XNOF:1 C1OF:0 C0OF:0 MC1Q:6 MC0Q:3 MC1OF:0 MC0OF:1 IPRTY:0 IPORT:0
    IADDR:1 FATAL:0

    P2 Status 1e00000
    XNOF:0 C1OF:0 C0OF:0 MC1Q:6 MC0Q:3 MC1OF:0 MC0OF:0 IPRTY:0 IPORT:0
    IADDR:0 FATAL:0

    P3 Status 1e00000
    XNOF:0 C1OF:0 C0OF:0 MC1Q:6 MC0Q:3 MC1OF:0 MC0OF:0 IPRTY:0 IPORT:0
    IADDR:0 FATAL:0

    MC CTL0 7e892e
    Refwdth:6 Refstagger:1 CASdly:1 CASwdth:4 RefInt:744 RefEnbl:1
    bank_present:3

    MC RASCAS 5b33
    CAS_L.wr:3 CAS_L.rd:6 RAS.precharge:4 more.RAS_L:1 CAS.width:3
    RAS.width:1

    CPU AFSR 0
    P_SYND:0 ETS:0 CE:0 UE:0 EDP:0 WP:0 CP:0 LDP:0
    BERR:0 TO:0 IVUE:0 ETP:0 ISAP:0 PRIV:0 ME:0

    SDBH AFSR 0
    SDBL AFSR 0
    ---------------------------------------

    Any insight would be very helpful. I can spend limited time and
    limited money on this and would prefer to keep everything running "as
    was" until the very end, but since there is no long-term future here
    management is not willing to devote considerable money or time to
    this. All I'm looking to do is add a little clarity to what might be
    going on, then let management decide what, if anything, we should do
    next.

    Thanks for any and all help.
    Rob. ()
     
    Rob Lancia, Jan 8, 2004
    #1
    1. Advertisements

  2. Rob Lancia

    cthulhu Guest

    Rob,
    I snipped the output, so you wouldnt have to scroll through it all
    to get to my reply.
    I did a quick search on sunsolve and it seems that there may be a bad
    CPU in the E420. Also, the 420's have a 3 year hardware warranty so,
    unless you got one of the first ones sold you may still be able to get
    it fixed from sun.

    I did find this case with a fatal reset and it was a bad CPU.
    Hope this helps.
    Alan

    Psycho UE AFSR aebb23000000
    BLK:0 UPA MID:3 DW_OFFSET:1 BYTEMASK:aebb
    S_DWR:0 S_DRD:0 S_PIO:0 P_DWR:0 P_DRD:0 P_PIO:0

    Psycho CE AFSR f1268f1f800000
    BLK:1 UPA MID:1f DW_OFFSET:0 BYTEMASK:268f
    E_SYND:f1 S_DWR:0 S_DRD:0 S_PIO:0 P_DWR:0 P_DRD:0 P_PIO:0

    SC Control 2000000
    EWP:0 IAP:0 FATAL:1 WAKEUP:0 BXIR:0 BPOR:0 SXIR:0 SPOR:0 POR:0

    P0 Status c1e00000
    XNOF:0 C1OF:0 C0OF:0 MC1Q:6 MC0Q:3 MC1OF:0 MC0OF:0 IPRTY:0 IPORT:0 IADDR:1
    FATAL
    :1

    P2 Status 1e00000
    XNOF:0 C1OF:0 C0OF:0 MC1Q:6 MC0Q:3 MC1OF:0 MC0OF:0 IPRTY:0 IPORT:0 IADDR:0
    FATAL
    :0

    MC CTL0 5e892e
    Refwdth:6 Refstagger:1 CASdly:1 CASwdth:4 RefInt:744 RefEnbl:1 bank_present:2

    MC RASCAS 5b33
    CAS_L.wr:3 CAS_L.rd:6 RAS.precharge:4 more.RAS_L:1 CAS.width:3 RAS.width:1

    CPU AFSR 1a8010000
    P_SYND:0 ETS:1 CE:0 UE:0 EDP:0 WP:0 CP:0 LDP:0
    BERR:0 TO:1 IVUE:0 ETP:1 ISAP:0 PRIV:1 ME:1

    SDBH AFSR 0
    SDBL AFSR 0

    Psycho UE AFSR aebb23000000
    BLK:0 UPA MID:3 DW_OFFSET:1 BYTEMASK:aebb
    S_DWR:0 S_DRD:0 S_PIO:0 P_DWR:0 P_DRD:0 P_PIO:0

    Psycho CE AFSR f1268f1f800000
    BLK:1 UPA MID:1f DW_OFFSET:0 BYTEMASK:268f
    E_SYND:f1 S_DWR:0 S_DRD:0 S_PIO:0 P_DWR:0 P_DRD:0 P_PIO:0

    SC Control 2000000
    EWP:0 IAP:0 FATAL:1 WAKEUP:0 BXIR:0 BPOR:0 SXIR:0 SPOR:0 POR:0

    P0 Status c1e00000
    XNOF:0 C1OF:0 C0OF:0 MC1Q:6 MC0Q:3 MC1OF:0 MC0OF:0 IPRTY:0 IPORT:0 IADDR:1
    FATAL
    :1

    P2 Status 1e00000
    XNOF:0 C1OF:0 C0OF:0 MC1Q:6 MC0Q:3 MC1OF:0 MC0OF:0 IPRTY:0 IPORT:0 IADDR:0
    FATAL
    :0

    MC CTL0 5e892e
    Refwdth:6 Refstagger:1 CASdly:1 CASwdth:4 RefInt:744 RefEnbl:1 bank_present:2

    MC RASCAS 5b33
    CAS_L.wr:3 CAS_L.rd:6 RAS.precharge:4 more.RAS_L:1 CAS.width:3 RAS.width:1

    CPU AFSR 1a8010000
    P_SYND:0 ETS:1 CE:0 UE:0 EDP:0 WP:0 CP:0 LDP:0
    BERR:0 TO:1 IVUE:0 ETP:1 ISAP:0 PRIV:1 ME:1

    SDBH AFSR 0
    SDBL AFSR 0

    Work Around none
    Integrated in Releases (none)
    Duplicate of (none)
    Patch ID (none)
    See Also (none)
    Summary The fatal resets were being caused by a faulty CPU module.
     
    cthulhu, Jan 8, 2004
    #2
    1. Advertisements

  3. Rob Lancia

    Fr3aK3r Guest

    Alan is right. Check it and see that the SC control reports a FATAL:1.
    A little bit below that you see P0 (PROC0) reporting a FATAL:1 also.
    This means a fatal hw reset has occured on proc0.
    You really want to get that one replaced... but have someone check your slot
    where the processor is in also. If this is damaged then it might mean a new
    mainboard.
     
    Fr3aK3r, Jan 9, 2004
    #3
  4. Rob Lancia

    Scott Howard Guest

    Most likely you've got a dead System Board. It's possable that the
    error is being caused by a CPU, with the most likely cantidate being
    CPU1, but the system board is far more likely given the Fatal Reset
    details provided.

    If you're not already running the latest firmware you should try and
    upgrade using patch 109082-05 (may not be possible if you can't get the
    machine to boot), and then do maximum level diagnostic run by doing
    the following at the OK prompt :
    setenv diag-level max
    setenv diag-switch? true
    power-off
    <<Power machine back on>>


    If that doesn't work, remove CPU1 and see if the problem goes away. If
    it does, move another CPU into slot 1 and see if the problem reoccurs. (to
    check if it's the CPU or the slot/motherboard).

    Hopefully all of that will get you somewhere, but at the end of the day
    it's most likely you'll need to call Sun and get them out to fix it with
    a new motherboard.

    As someone else pointed out it may still be under warranty. The E420R's
    have a 3 year warranty, but started shipping in November 1999, so you
    may or may not be lucky...

    Scott.
     
    Scott Howard, Jan 10, 2004
    #4
  5. I have the processor for $400 ea if anyone need some


    x-- 100 Proof News - http://www.100ProofNews.com
    x-- 3,500+ Binary NewsGroups, and over 90,000 other groups
    x-- Access to over 800 Gigs/Day - $8.95/Month
    x-- UNLIMITED DOWNLOAD
     
    Mike Beckmann, Jan 15, 2004
    #5
  6. Rob Lancia

    Luigi Guest

    For what it's worth, I scanned through my old status reports since I had
    run into this message before and Sun indicated at the time the Psycho
    chip was bad. Supposedly the chip is embedded on the system board so the
    entire board had to be replaced.
     
    Luigi, Jan 17, 2004
    #6
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.