1. This forum section is a read-only archive which contains old newsgroup posts. If you wish to post a query, please do so in one of our main forum sections (here). This way you will get a faster, better response from the members on Motherboard Point.

Blade 2000 thermal failures being reported, monitoring HW suspectedto be at fault?

Discussion in 'Sun Hardware' started by George Mezzomo, Apr 29, 2014.

  1. Hello! <and please bear with me, messy/long post>

    Nowadays, both of my (identical) Sun Blade 2000 systems seem to be reporting temperatures outside of their operating threshold (0-95, usually shows -127, PICLenvd shuts Solaris down as soon as it finishes loading).
    Mixing CPU modules (I have/used to have 4 1050MHz USIII-Cu modules, 2 foreach system) show different results. Generally, using a single module on slot 0 leads to no failure reported...
    I have already damaged the CPU 'socket'/thingy with golden springs on oneof the modules tinkering with it, so now I have 3 "functional" modules andone extra for parts/spares(BTW, does anyone know if I can buy the 'socket'? I just know it's a Cin:apse connector, as shown by the manufacturer's [Cinch] site as employed on the SPARC CPU modules> http://www.cinch.com/images/cinapse/cinapse3.gif)

    Well, getting to the point: may it be software related (currently no way totest much, machine has no network and no other install media for other systems, or should I try to kill the problem unsoldering the sensors from the CPU module boards (figured how to do that already)?

    One of the modules already has a dead sensor (Shows "driver MAX1617 not installed" while booting), do I keep it like that? Machine is monitored constantly and is operated only in a cooled room for limited time, for 'entertainment' purposes only...

    Also, does anyone have info on the many jumpers of this machine, the ones the official Sun documentation available on the net doesn't explain?


    Sorry for the messed-up brainstorming, quite confused by it all
    TIA
     
    George Mezzomo, Apr 29, 2014
    #1
    1. Advertisements

  2. George Mezzomo

    DoN. Nichols Guest

    A rather painfully small image.

    Personaly, I would expect the desoldering of the old socket and
    installing a new one would be likely to cook internal layers on the PC
    board, so I would not try it.

    I know that the Sun Fire 280R (same system board, rack mount
    chassis) wants a different fan tray installed when using the "Cu" cpu
    modules, as they need more cooling. IIRC, the fan in line with the CPU
    modules is a 14W one, and the others are smaller. (Or was it a 28W
    fan?) Anyway -- larger than the other two. The SB-2000 only has the
    two fans, so compare the one in line with the CPU modules with that in
    line with the memory modules -- or the PCI boards, I forget which
    location the second fan is in.)

    But -- I presume that you have checked that the CPU modules
    don't have dust buildup in the heat sinks. Have you pulled the power
    supply out and opened it to verify that it does not have a buildup of
    dust in there? We've got a couple of cats, so the dust buildup has to
    be dealt with from time to time.

    Another thing which comes to mind is that it appears that you
    have been unbolting the heat sink modules from your CPU modules. You
    know that the thermal conductivity from CPUs to heat sink is via a
    specially treated silver filled silicone rubber pad. It includes a
    thermal grease when initially installed to maximize contact to the CPU
    and the heat sink. If you have pulled it, you should either replace the
    pad with a new freshly-coated one, or at least clean with an appropriate
    solvent (including the contact surfaces of the CPU and the heat sink)
    and re-coat it with an approrpriate silicone grease. (The trick is
    finding out which grease to use.)

    Oh yes -- another thought -- have you bypassed the cover switch
    and are running it without the cover? This changes the airflow and
    cooling patterns for the worse. And you *do* have the plastic shroud in
    place to guide the fan's air through the CPU's heat sinks, I hope.
    I, personally, would want the sensors to be working. but I run
    mine full time.
    Maybe you can get away with it.
    I suspect that these were used in testing the board after
    manufacture.

    Good Luck,
    DoN.
     
    DoN. Nichols, Apr 30, 2014
    #2
    1. Advertisements

  3. Trying to shorten it up:
    1-Sure it is the same. Socket is not soldered, contact-only
    2- Nah, Blade 2K is Cu CPUs only, fans are OK
    3-Machines have been thoroughly cleaned (observing anti-ESD procedures, etc, etc)<not a newbie tech, FYI :p )
    4-CPU to HS thermal interface seems good, at least my modules have no grease, just some kind of plastic sheet with a graphite pad (like the ones used in old 486-era PC's). Might try to replace it with common thermal (or silver) grease in one of the modules.
    5-It doesn't seem to be an airflow problem, results are the same, closed orsans-cover...CPU shrouds are in place

    Well, by the random values, (sometimes stuck at -127), I'm gonna remove thedead sensor from the already quiet module, and kill another one...If I need to, I can solder new sensors back. Machine is seeing very little use lately.

    Thanks for all!

    <if anyone has any other ideas, keep the ball rolling>
     
    George Mezzomo, May 7, 2014
    #3
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.