Motherboard Forums


Reply
Thread Tools Display Modes

more trouble with Sun Blade 1000

 
 





















DoN. Nichols
Guest
Posts: n/a

 
      07-05-2009, 10:34 PM


On 2009-07-04, glennklockwood <> wrote:
> Hi again.
>
> I've been having a peculiar problem with my Sun Blade 1000, which I
> just upgraded to dual 900MHz US-III Cu processors. The system worked
> fine for a few days, but today I found that when I pressed the power
> button, it powered on for two to three seconds (fans start, front
> lights up), then it shut off. There isn't enough time for anything to
> come across the serial console, and subsequent attempts to power on
> result in the same thing happening. I was finally able to get the
> system to power on and boot, but almost immediately I got a thermal
> shutdown notice (citing a temperature of 127--both fans were working
> though).


Hmm ... that is rather low for a shutdown -- depending on which
temperature system is in use. In my Sun Fire 280R (which has the RSC
(Remote System Control) card for remote monitoring) the temperatures
are;
F C Warn F Warn C Fail F Fail C
RSC card 91 33 212 100 230 110
CPU0 131 55 199 93 203 95
CPU1 127 53 199 93 203 95

This is with two 900 MHz Cu CPUs.

The actual CPU temperatures in the spare Sun Fire 280R (for experimentation)
which has 750 MHz (non-Cu) CPUs are somewhat hotter.

> I should point out that my other Blade 1000 showed very similar power-
> on difficulties with its dual 750MHz processors. I finally got fed up
> with having to hit the power button several times before it would stay
> on, and I swapped the HDs and memory from that machine to this current
> one which also started with dual 750's. This power-on problem only
> started happening again after the upgrade to 900's. With the older
> machine, though, I never got thermal warnings after finally powering
> on; in fact, if I could actually get it to boot, it would stay up for
> days without any problems.


Note that for the Sun Fire 280R (same system board and CPUs) the
fan tray has three fans -- one for the PCI and UPA cards, one for the
CPUs, and one for the memory DIMMs. And when upgrading from the non-Cu
CPUs to Cu types you are warned to replace the fan tray with a later
model. I've compared both types and find the only visible difference is
that the later model (for the Cu CPUs) has a 14W fan in the center (CPU)
position instead of a 7W fan. Note that the fans in a SB-2000 do not
seem to be any higher power than those in the SB-1000.

> I've seen this sort of behavior (machine will power everything for a
> few seconds before shutting down) in an x86 machine that had a faulty
> motherboard, but this is now happening in two separate motherboards so
> I am skeptical that this is the case here. I found that unplugging
> the DVD and disks did not help anyway, so the problem must lie with
> the CPUs, motherboard, memory, or power supply.


Or -- possibly the CPU to system board connection. It is
possible that some dust is obscuring contacts which allow reading the
temperatures on the CPUs, and as a result the CPUs are *sensed* as
running hot while they are not *really* hot. I would suggest removing
each CPU spraying the connectors in both the CPUs and the system board
with a *good* contact cleaner and re-seating the CPUs. Try one CPU at a
time in slot 0 and see if one has the problem and the other does not.

If one has the problem and the other does not, is it possible
that someone removed the heat sinks from that CPU module and then
re-attached them -- perhaps reducing the thermal conductivity while
doing so? I've seen one eBay vendor who seems to remove the heatsinks
to photograph the actual CPU chip -- something which *I* would not do,
and I would not buy from that vendor. If the photos show the CPU chip
instead of the barcode label, skip that vendor.

Also -- which style of torque wrench does your system have?
There are two -- one (the older style) is the wire bent into a circle
and you tell the torque limit by the ends of the circle touching, and
the other (newer style) is a torque limiting screwdriver with a dayglo
green handle which slips with a click when the proper torque is reached.

I have seen Sun documents (in PDF format) which suggest that the
later design is far to be preferred. The green torque limiting
screwdriver fits in a clip in the cage where the DVD drive, the smart
card and the floppy drives are all mounted. Look for a dayglow green
the came as the ring around the Robertson (square drive) sockets in the
CPU modules.

The old style torque driver lives in a green plastic carrier
which slides in between the two internal disk drives. Older SB-1000s
have that style. Newer ones and older SB-2000s have the torque
screwdriver style. Newer SB-2000s come without a torque wrench at all,
with the assumption that when you buy new CPUs from Sun, you will
receive a new Torque driver with each CPU. If you don't have *any*
torque limiting screwdriver Utica makes some very nice adjustable ones
(quite expensive), and you want one which will reach 5 inch-pounds IIRC.
The screwdrivers from Utica come in both inch-pounds range and in
inch-ounces range. The inch-pounds ones go down to 6 inch-pounds, but
you can fudge it to one step lower to get the five you need. Obviously,
you need to multiply the five inch-pounds by 16 to get 80 inch-ounces.

It is quite important to follow the instructions for removing
and replacing the CPU modules. If you don't, you can damage the
connectors on either the system board or the CPU modules. Also, trying
to remove them by turning too many turns at one end before going to the
other can cause the circlip to pop off the jack screw and get lost
inside the system along with the thrust washers.

> Does anyone have any ideas? I was hoping that the power-on issues I
> was having with my first Blade 1000 would be fixed by moving the ram/
> disks to this new one, but now they've cropped up again and I don't
> know what component would be at fault here.


CPUs and their mounting (connectors, torque, and proper mounting
of heat sinks to CPUs) are most likely the cause if you are quickly
getting overtemperature warnings and the fans are running.

Also -- for other problems (assuming that you have eight DIMMs
in the system) pull out four and see what happens. If that still gives
the same problems, pull the remaining four and replace them with the
four which you first pulled.

Once you have a group of four identified as problematical, you
can swap out one at a time with others (assuming that both sets of four
are the same size) to identify the actual bad DIMMs. I found that two
512 MB DIMMs were flakey out of eight -- and one flakey one was in each
set, so I had to do a lot of swapping to get a known good set before
testing the others to truly identify the bad ones.

You have my thoughts above.

Good Luck,
DoN.

--
Email: <> | Voice (all times): (703) 938-4564
(too) near Washington D.C. | http://www.d-and-d.com/dnichols/DoN.html
--- Black Holes are where God is dividing by zero ---
 
Reply With Quote
 
DoN. Nichols
Guest
Posts: n/a

 
      07-05-2009, 10:39 PM
On 2009-07-05, Huge <> wrote:
> On 2009-07-05, Dave <> wrote:
>> glennklockwood wrote:
>>> Hi again.
>>>
>>> I've been having a peculiar problem with my Sun Blade 1000, which I
>>> just upgraded to dual 900MHz US-III Cu processors. The system worked
>>> fine for a few days, but today I found that when I pressed the power
>>> button, it powered on for two to three seconds (fans start, front
>>> lights up), then it shut off. There isn't enough time for anything to
>>> come across the serial console, and subsequent attempts to power on
>>> result in the same thing happening.

>
> I have exactly the same problem with my (now retired) S/B 2000. Tightening
> the CPU mounting screws reduced it from "every time" to "occasionally" so
> I really ought to take them out and reseat them.


Do you use the torque limiting screwdriver for installing the
CPUs? It is really important to do so -- and to follow the instructions
in the pamphlet located in a box on the side cover -- or located on the
airflow guide for a Sun Fire 280R system

> BTW, the machine is for sale (I'm in the UK) if anyone wants it.


Not in the UK -- otherwise, I might take a risk with it.

Good Luck,
DoN.

--
Email: <> | Voice (all times): (703) 938-4564
(too) near Washington D.C. | http://www.d-and-d.com/dnichols/DoN.html
--- Black Holes are where God is dividing by zero ---
 
Reply With Quote
 
DoN. Nichols
Guest
Posts: n/a

 
      07-05-2009, 11:08 PM
On 2009-07-05, glennklockwood <> wrote:
> On Jul 5, 11:05*am, glennklockwood <glennklockw...@gmail.com> wrote:
>> On Jul 5, 2:36*am, Andreas Wacknitz <a.wackn...@gmx.de> wrote:


[ ... ]

>> > Blade 1000/2000 CPU modules are very sensitive. Have you used the
>> > provided tool to attach them? In my experience your kind of problems is
>> > caused by badly fixed CPUs. You should try to remove them and insert
>> > them again (one by one).

>>
>> > Regards
>> > Andreas

>>
>> As an update, it in fact was an issue with a CPU module. *In this
>> case, apparently one of the 900MHz processors I bought was bad. *I did
>> try re-seating and re-torquing each module using the cylindrical
>> torque wrench provided in the SB2k's, and I found that the system
>> would not boot with one specific module in either CPU0 or CPU1. *It's
>> strange that the module did work the first few times after it arrived
>> but broke very soon after. *The guy who sold it to me must've cracked
>> an egg into its radiator.


I'm willing to bet that it is from the eBay vendor who takes the
heat sinks off to photograph the CPU chip's identifier -- and in the
process has reduced the efficiency of the heat conductivity from the
chip to the heat sink.

>> Now I am left with the choice of using one working 900MHz module or a
>> pair of 750MHz modules. *Given the electricity that these things pull
>> and the trouble I've been having with hardware, though, maybe it's
>> time I just sold these two Blade 1000's off and save the money I've
>> been spending on their maintenance on a new x86.


[ ... ]

> As an afterthought, has anyone tried swapping the physical chip from a
> US III Cu module onto a non-Cu module? I have quite a few extra
> (working) 750MHz modules, and if my 900MHz chip itself is still
> working, it would be worth my while to transplant it onto one of these
> other working modules and just toss the non-Cu chip. There are no
> apparent physical differences between the Cu and non-Cu modules, but
> I'd hate to damage something further by trying something which should
> not be done.


Hmm ... the 900 MHz Cu and the 750 MHz non-Cu modules have the
same cache size. But I'm not sure whether the Cu chips might work at a
lower voltage, and the module might include voltage regulators to feed
the chips.

I've never had the heat sink off, so I don't know whether the
CPU is in a socket or is soldered to the board. And I'm not sure what
torque limits to set the heat sink screws to. It looks as though there
are springs under the heads of the screws and if the screws have limited
travel the springs may simply provide the proper compression -- and I
suspect the tightening sequence for these is also critical as is the
sequence for the jackscrews installing the CPU modules. Anyway --
assuming that the CPU chip is socketed, and that there are no problems
with the mounting screws and springs, make sure that whatever is between
the CPU and the heatsink is properly replaced. If it is one of those
silicone rubber heat conductor pads, you should be able to reuse it. If
it uses a heat sink compound, you should clean the surface and renew the
compound when you swap CPU chips.

Let me know how it goes. I've got a damaged module with a good
900 MHz Cu CPU and several 750 MHz CPUs.

Good Luck,
DoN.

--
Email: <> | Voice (all times): (703) 938-4564
(too) near Washington D.C. | http://www.d-and-d.com/dnichols/DoN.html
--- Black Holes are where God is dividing by zero ---
 
Reply With Quote
 
glennklockwood
Guest
Posts: n/a

 
      07-06-2009, 01:42 AM
On Jul 5, 6:08*pm, "DoN. Nichols" <dnich...@d-and-d.com> wrote:
> * * * * [ ... ]
>
> * * * * Let me know how it goes. *I've got a damaged module with a good
> 900 MHz Cu CPU and several 750 MHz CPUs.
>
> * * * * Good Luck,
> * * * * * * * * DoN.
>
> --
> *Email: * <dnich...@d-and-d.com> * | Voice (all times): (703) 938-4564
> * * * * (too) near Washington D.C. |http://www.d-and-d.com/dnichols/DoN.html
> * * * * * *--- Black Holes are where God is dividing by zero ---


Apparently you cannot drop an UltraSPARC III Cu processor into the CPU
module of a non-Cu processor. I swapped the processor dies, plugged
the new module with the US III Cu back in, and hit the power button.
Surprisingly, the system did not silently power off after three
seconds, which is what it used to do. Rather, it stayed powered up
for about ten seconds, beeped three times, then shut down. Just for
the record, I have been using the newer torque wrench from the Blade
2000's (the 'dayglo green' variety) and followed all of the suggested
methods (tightening both captive screws evenly, etc)

What I take from this is that the module was the faulty part and that
my US III Cu processor is probably fine, but the Cu and non-Cu modules
are incompatible. However, I couldn't find any beepcodes in the Sun
documentation for the Blade 1000/2000 series, so I'm not really sure
where the system is finding the fault.

Also for those who may be interested, the UltraSPARC III processors
are pinless and actually do not have any thermal compound between the
processor die and heat sink. Instead they have a clear plastic sheet
with a square of aluminum foil in the center which makes thermal
contact with both sides. Removing the heat sink and processor follows
a process identical to removing the processor from an Enterprise 4500
CPU tray, and I'd imagine the torque stated in the E4500 service
manual for US II CPUs is the same for US III since the heat sinks and
processor/board contacts are identical. Unscrew the heat sink's
captive screws on the diagonals at half-turn intervals until they all
loosen up, then pull the heat sink off and pry the processor die from
the plastic surround.

Anyway, if anyone knows what three beeps mean, please let me know.
Otherwise, I guess I am just stuck with one US III Cu for now. Thanks
for everyone's advice with this.

glenn k. lockwood

ps. I took photos of the dismantled CPU module; if this is of interest
to any aspiring sparc hackers, send me an email.
 
Reply With Quote
 
DoN. Nichols
Guest
Posts: n/a

 
      07-06-2009, 04:11 AM
On 2009-07-06, glennklockwood <> wrote:
> On Jul 5, 6:08*pm, "DoN. Nichols" <dnich...@d-and-d.com> wrote:
>> * * * * [ ... ]
>>
>> * * * * Let me know how it goes. *I've got a damaged module with a good
>> 900 MHz Cu CPU and several 750 MHz CPUs.
>>
>> * * * * Good Luck,
>> * * * * * * * * DoN.


[ ... ]

> Apparently you cannot drop an UltraSPARC III Cu processor into the CPU
> module of a non-Cu processor. I swapped the processor dies, plugged
> the new module with the US III Cu back in, and hit the power button.
> Surprisingly, the system did not silently power off after three
> seconds, which is what it used to do. Rather, it stayed powered up
> for about ten seconds, beeped three times, then shut down.


Sigh!

> Just for
> the record, I have been using the newer torque wrench from the Blade
> 2000's (the 'dayglo green' variety) and followed all of the suggested
> methods (tightening both captive screws evenly, etc)


Good!

> What I take from this is that the module was the faulty part and that
> my US III Cu processor is probably fine, but the Cu and non-Cu modules
> are incompatible. However, I couldn't find any beepcodes in the Sun
> documentation for the Blade 1000/2000 series, so I'm not really sure
> where the system is finding the fault.


It *might* be that there is identification coding in the module
saying whether it is a Cu or non-Cu CPU, as the documentation strongly
advices against mixing the two.

> Also for those who may be interested, the UltraSPARC III processors
> are pinless and actually do not have any thermal compound between the
> processor die and heat sink.


Interesting.

> Instead they have a clear plastic sheet
> with a square of aluminum foil in the center which makes thermal
> contact with both sides.


Is there something like a thin film of silicone grease on the
aluminum foil? (Both sides, or neither side. :-)

> Removing the heat sink and processor follows
> a process identical to removing the processor from an Enterprise 4500
> CPU tray, and I'd imagine the torque stated in the E4500 service
> manual for US II CPUs is the same for US III since the heat sinks and
> processor/board contacts are identical. Unscrew the heat sink's
> captive screws on the diagonals at half-turn intervals until they all
> loosen up, then pull the heat sink off and pry the processor die from
> the plastic surround.


O.K. That sounds straightforward enough.

> Anyway, if anyone knows what three beeps mean, please let me know.


Hmm ... you can get more diagnostic information if you unpug the
keyboard and connect a serial terminal (or a computer pretending to be a
serial terminal) to the TTYA connector. Unless the EEPROM settings have
been changed, it should be 9600 baud, 8 bits, no parity, 1 stop bit.

It also may be that the CPU needs a different supply voltage
with different regulators on the CPU module.

There is also a 6-pin jumper block pad on the module (opposite
end from the one with the orange barcode label) and the jumper
configuration might be different. All my 900 MHz and 1200 MHz Cu CPUs
are in systems running full time, so I can't take a look at anything
other than the 750 MHz one, which has a single jumper block as follows:

Connector edge of board
|
(CAP)(CAP)(CAP) |
|
+ o |
+ o | Side of board.
o o |

The two pins marked with '+' are joined with the jumper block. Compare
the jumper position with that o the original (faulty) CPU module.

Hmm ... on the smaller board behind the front panel near the end
with the jumper blocks there is a bridge rectifier, and near the other
end (behind the small heat sink and the end of the main heat sink) there
are six 4-pin packages which are fairly large and which *might* be
regulator modules, though I would expect heat sinks on regulator
modules.

> Otherwise, I guess I am just stuck with one US III Cu for now. Thanks
> for everyone's advice with this.
>
> glenn k. lockwood
>
> ps. I took photos of the dismantled CPU module; if this is of interest
> to any aspiring sparc hackers, send me an email.


I would be interested -- but I can't receive attachments of that
size, as the e-mail server is configured to reject anything over 30K
total mail size to keep viruses out of a couple of small mailing lists
(with vulnerable Windows users) which I operate.

My e-mail address above and below is valid, but I would need a
URL to download the images from, not to receive them directly via
e-mail.

Best of luck,
DoN.

--
Email: <> | Voice (all times): (703) 938-4564
(too) near Washington D.C. | http://www.d-and-d.com/dnichols/DoN.html
--- Black Holes are where God is dividing by zero ---
 
Reply With Quote
 
Huge
Guest
Posts: n/a

 
      07-06-2009, 08:04 AM
On 2009-07-05, DoN. Nichols <> wrote:
> On 2009-07-05, Huge <> wrote:
>> On 2009-07-05, Dave <> wrote:
>>> glennklockwood wrote:
>>>> Hi again.
>>>>
>>>> I've been having a peculiar problem with my Sun Blade 1000, which I
>>>> just upgraded to dual 900MHz US-III Cu processors. The system worked
>>>> fine for a few days, but today I found that when I pressed the power
>>>> button, it powered on for two to three seconds (fans start, front
>>>> lights up), then it shut off. There isn't enough time for anything to
>>>> come across the serial console, and subsequent attempts to power on
>>>> result in the same thing happening.

>>
>> I have exactly the same problem with my (now retired) S/B 2000. Tightening
>> the CPU mounting screws reduced it from "every time" to "occasionally" so
>> I really ought to take them out and reseat them.

>
> Do you use the torque limiting screwdriver for installing the
> CPUs?


Yes. Both CPUs were insufficiently torqued down.


--
http://hyperangry.blogspot.com/
[email me, if you must, at huge {at} huge (dot) org <dot> uk]
 
Reply With Quote
 
Steve Firth
Guest
Posts: n/a

 
      07-06-2009, 04:42 PM
Dave <> wrote:

> > I have exactly the same problem with my (now retired) S/B 2000. Tightening
> > the CPU mounting screws reduced it from "every time" to "occasionally" so
> > I really ought to take them out and reseat them.
> >
> > BTW, the machine is for sale (I'm in the UK) if anyone wants it.
> >

>
> As a matter of interest, what are you replacing it with? I assume it's
> not going to be Microsoft's Windows Vista 7!


Dunno what Huge is doing, but I replaced my old server with a V20Z with
dual 2.5GHz Opterons and 4GB of RAM. The OS is ESXi with Ubuntu Guest
OSen and separate VMs for mail, news, ftp, MySQL, DNS/DHCP, Apache.
Storage is a PITA since the maximum that one can fit internally is
320GB. I can't find any U320 SCSI drives bigger than 320GB and the RAID
controller only supports RAID 1.

I've now got an external homebrew iSCSI SAN with 4TB and will be
jiggering that soon to to have an Atom 330 ITX mainboard to save juice.

Overall I'm pleased with it, it's let me play about with stuff that I
wanted to play about with and it's blindingly fast compared to the
previous Dual Celeron Server. The V20Z cost all of £150 from Fleabay so
I'v every pleased with that. The only negative is that it's the noisiest
server I've ever worked with, let alone owned.

 
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: DELL Vostro 1000 display problem Fixer Dell 0 01-09-2009 08:48 AM
kaufen ibuprofen 1000 ibuprofen langzeiteinnahme ibuprofen magenkaufen ibuprofen alkohol ibuprofen 600mg ibuprofen stada 400mg ibuprofen pbibuprofen bestellen kaufen ibuprofen wechselwirkungen ibuprofenblutverduennend ibuprofen stillzeit ibuprofen fi susi40009@googlemail.com Apple 0 03-15-2008 12:11 PM
Warranty for Refurbished Vostro 1000 Notebook? Mister Softie Dell 2 10-04-2007 01:15 PM
dell inspiron 1000 apparently dead.... power plug problems? bevernon@aol.com Dell 4 03-04-2007 02:54 PM
Networking Trouble (firewall?) poopdeville@gmail.com Apple 0 12-25-2004 11:37 PM


All times are GMT. The time now is 10:41 AM.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43