Motherboard Forums


Reply
Thread Tools Display Modes

Bad RAM or not?

 
 
Eric
Guest
Posts: n/a
 
      06-21-2011, 07:00 PM
How reliable is 'memtester'?

I have an Ultra 10 running Sol 8 and memtester is failing towards the
end of the first cycle (don't have the exact error message). However
the memory seems to be passing POST. Some guy had me run the full
diagnostics (setenv diag-level max; setenv diag-switch? True) and it
passed that too.

As if that weren't enough, I have another Ultra 10 (same Solaris) that
reports a memory error on boot (looks like during boot-up not POST)
but has no problems with memtester, ran a few dozen cycles with no
trouble.

So my questions: Is memtester any good? Even within its limitation
of only being able to test free memory, is it reliable?
Alternatively, should I accept the POST results as gospel and s-can
memtester?

TIA,
eric
 
Reply With Quote
 
 
 
 
Winston
Guest
Posts: n/a
 
      06-22-2011, 01:10 AM
Eric <(E-Mail Removed)> writes:
> How reliable is 'memtester'?


Sorry, I have no idea.

> I have an Ultra 10 running Sol 8 and memtester is failing towards the
> end of the first cycle (don't have the exact error message). However
> the memory seems to be passing POST. Some guy had me run the full
> diagnostics (setenv diag-level max; setenv diag-switch? True) and it
> passed that too.


Do printenv have "selftest-#megs"? If so, does it match the machine's
memory size? If it's smaller, POST is only testing part of the memory,
and if the tested part of memory is fine, POST passes.

> I have another Ultra 10 ...


The boot (EE)PROM should have a memory test diagnostic, with more
options than the POST test. Try running that on the questionable memory.
-WBE
 
Reply With Quote
 
 
 
 
Eric
Guest
Posts: n/a
 
      07-07-2011, 09:30 PM
On Jun 21, 8:10*pm, (E-Mail Removed) (Winston) wrote:
>
> * *Do printenv have "selftest-#megs"? *If so, does it match the machine's
> memory size? *If it's smaller, POST is only testing part of the memory,
> and if the tested part of memory is fine, POST passes.
>

I have no idea what you mean by "printenv" or "selftest-#megs". I'll
have to dig
through my OpenBoot doc. I'm not seeing anything grossly wrong and
when
the Ultra goes through the extended diagnostics it says everything
passed.

>
> * *The boot (EE)PROM should have a memory test diagnostic, with more
> options than the POST test. *Try running that on the questionable memory.
> *-WBE

I guess my language is sloppy, when I refer to POST, I mean everything
that
happens from the time you press the ON button until the time the Ultra
tries
to boot off of something. In that context, an extended diagnostic was
run
(I'm pretty sure).

How about another approach. It turns out I have SunVTS on this thing
(ver 4.0 to
go w/ Solaris 8). In multiuser mode, under CDE I'm running a
"functional mode"
test with pmemtest as well as all the processor tests. I plan on
running this test
overnight. If I'm trying to smoke out bad memory, is this a good way
to do it?

TIA,
eric

 
Reply With Quote
 
DoN. Nichols
Guest
Posts: n/a
 
      07-08-2011, 02:35 AM
On 2011-07-07, Eric <(E-Mail Removed)> wrote:
> On Jun 21, 8:10*pm, (E-Mail Removed) (Winston) wrote:
>>
>> * *Do printenv have "selftest-#megs"? *If so, does it match the machine's
>> memory size? *If it's smaller, POST is only testing part of the memory,
>> and if the tested part of memory is fine, POST passes.
>>

> I have no idea what you mean by "printenv" or "selftest-#megs". I'll
> have to dig
> through my OpenBoot doc. I'm not seeing anything grossly wrong and
> when
> the Ultra goes through the extended diagnostics it says everything
> passed.


Everything that it *tested* passed. The setting of the
"selftest-#megs" variable (if present) will control how much memory is
tested. With a slow system and a lot of RAM, it can slow boot
significantly, so it is common practice to set it to a minimum value
(e.g. 1 MB) after the system is initially proven to work, until problems
come up agains.

"printenv" (when typed at the OBP level) shows all the
environment variables which set lots of things which take effect at boot
time. You can see these all by typing (from a booted system) "eeprom".

Some clips from mine (on a SunBlade 2000) shows:

================================================== ====================
diag-passes=1
enclosure-type=540-3256-15
banner-name=Sun Blade 2000/1000
energystar-enabled?=true
pcia-probe-list=4,1

[ ... ]

ansi-terminal?=true
screen-#columns=80
screen-#rows=34
ttyb-rts-dtr-off=false

[ ... ]

security-#badlogins=0
#power-cycles=48
diag-script=none
diag-level=min
diag-switch?=false
error-reset-recovery=boot
================================================== ====================

This one does not have the "selftest-#megs" option, or if it
does, it is only visible from the OBP level.

But a SS-5 (much older machine, of course) has it:

================================================== ====================
screen-#rows=34
selftest-#megs=1
scsi-initiator-id=7
================================================== ====================

If it is present, it defines how much memory is tested during the normal
POST.

Note, BTW, that the weird characters in there are part of the
variable names, and indicate that the contents are something other than
a plain string. A '?' indicates that it is expecting either "true" or
"false", and a '#' indicates that it is expecting a number value.

These *must* be present. No problem at the OBP level, but from
a booted system, the shells have special meanings for '?' and '#', so
you have to put the whole line in double quotes, or put '\' in front of
each such character.

And -- only root can change the settings from a booted system
using the "eeprom" command, though anybody can *look* at them.

And from the OBP you use the "setenv" command, without an '='
sign (but with a space), while from the eeprom command from a booted
system, it will be something like:

eeprom "selftest-#megs=40"

Or whatever value represents the whole of memory.

The things which control the boot time diagnostics on my SB-2000 are:

================================================== ====================
diag-passes=1
diag-file: data not available.
diag-device=disk:f
diag-script=none
diag-level=min
diag-switch?=false
================================================== ====================

Currently, they are set to bypass the majority of the tests, and will
remain so until I start having troubles with the system.

>>
>> * *The boot (EE)PROM should have a memory test diagnostic, with more
>> options than the POST test. *Try running that on the questionable memory.
>> *-WBE

> I guess my language is sloppy, when I refer to POST, I mean everything
> that
> happens from the time you press the ON button until the time the Ultra
> tries
> to boot off of something. In that context, an extended diagnostic was
> run
> (I'm pretty sure).


That is where the "selftest-#megs" value (if present) limits the
amount of memory that is tested.

There are other test options which can be only invoked from the
OBP prompt.

> How about another approach. It turns out I have SunVTS on this thing
> (ver 4.0 to
> go w/ Solaris 8). In multiuser mode, under CDE I'm running a
> "functional mode"
> test with pmemtest as well as all the processor tests. I plan on
> running this test
> overnight. If I'm trying to smoke out bad memory, is this a good way
> to do it?


I don't find "pmemtest" on my Solaris 10. Nor do I find it on
systems running Solaris 2.6, so I have no idea what it really does, but
I suspect that it can't touch the memory in which the kernel is running.

However -- the "diag-level" (in the OPB) can be set to "max",and
the "diag-switch?" set to "true" for the maximum test during the boot
process. Or -- you can invoke certain tests from the command line in
OBP. Try something like "test ?" to get a list of test options.

Good Luck,
DoN.

--
Remove oil spill source from e-mail
Email: <(E-Mail Removed)> | Voice (all times): (703) 938-4564
(too) near Washington D.C. | http://www.d-and-d.com/dnichols/DoN.html
--- Black Holes are where God is dividing by zero ---
 
Reply With Quote
 
Eric
Guest
Posts: n/a
 
      07-20-2011, 06:01 PM
On Jul 7, 9:35*pm, "DoN. Nichols" <(E-Mail Removed)> wrote:
> On 2011-07-07, Eric <(E-Mail Removed)> wrote:
>
> > On Jun 21, 8:10 pm, (E-Mail Removed) (Winston) wrote:

>
> >> Do printenv have "selftest-#megs"? If so, does it match the machine's
> >> memory size? If it's smaller, POST is only testing part of the memory,
> >> and if the tested part of memory is fine, POST passes.

>
> > I have no idea what you mean by "printenv" or "selftest-#megs". *I'll
> > have to dig
> > through my OpenBoot doc. *I'm not seeing anything grossly wrong and
> > when
> > the Ultra goes through the extended diagnostics it says everything
> > passed.

>
> * * * * Everything that it *tested* passed. *The setting of the
> "selftest-#megs" variable (if present) will control how much memory is
> tested. *With a slow system and a lot of RAM, it can slow boot
> significantly, so it is common practice to set it to a minimum value
> (e.g. 1 MB) after the system is initially proven to work, until problems
> come up agains.
>
> * * * * "printenv" (when typed at the OBP level) shows all the
> environment variables which set lots of things which take effect at boot
> time. *You can see these all by typing (from a booted system) "eeprom".
>


I thought it goes without saying that you can only pass/fail what you
test so I didn't say it . But I did check, and I don't have the
"selftest-#megs" variable nor do I have the "diag-passes" variable
which seems like a nifty thing to have.

>
> * * * * Note, BTW, that the weird characters in there are part ofthe
> variable names, and indicate that the contents are something other than
> a plain string. *A '?' indicates that it is expecting either "true" or
> "false", and a '#' indicates that it is expecting a number value.
>


Makes sense. There's also that thing where you have to spell "True"
with an u/c "T" and "false" with a l/c "f" or vice versa which I guess
is their way of verifying intent.

>
> > How about another approach. *It turns out I have SunVTS on this thing
> > (ver 4.0 to
> > go w/ Solaris 8). *In multiuser mode, under CDE I'm running a
> > "functional mode"
> > test with pmemtest as well as all the processor tests. *I plan on
> > running this test
> > overnight. *If I'm trying to smoke out bad memory, is this a good way
> > to do it?

>
> * * * * I don't find "pmemtest" on my Solaris 10. *Nor do I find it on
> systems running Solaris 2.6, so I have no idea what it really does, but
> I suspect that it can't touch the memory in which the kernel is running.
>


On sunvts 4.0, which I ran under the CDE environment, using the
"logical" system map you can expand the "memory" item and select
"vmemtest" which tests physical memory and swap or you can select
"pmemtest" which test only physical memory. The setup dialog for the
pmemtest says you can test 100% of the memory but the doc says its a
read-only test so I don't know how useful that is. From looking at
the docs it appears sunvts 6.0 (which is what Solaris 10 uses) is set
up the same way.

Last week, after being away for two weeks, I ran the whole bunch of
tests; vts, memtester, POST and now everything checks out fine, no
errors, no nothing. It's baffling but what can you do?

Don, you've been outrageously helpful with this and also with that
video card issue I had. Thank you very much.

eric


 
Reply With Quote
 
DoN. Nichols
Guest
Posts: n/a
 
      07-21-2011, 03:31 AM
On 2011-07-20, Eric <(E-Mail Removed)> wrote:
> On Jul 7, 9:35*pm, "DoN. Nichols" <(E-Mail Removed)> wrote:
>> On 2011-07-07, Eric <(E-Mail Removed)> wrote:
>>
>> > On Jun 21, 8:10 pm, (E-Mail Removed) (Winston) wrote:


[ ... ]

>> * * * * Everything that it *tested* passed. *The setting of the
>> "selftest-#megs" variable (if present) will control how much memory is
>> tested. *With a slow system and a lot of RAM, it can slow boot
>> significantly, so it is common practice to set it to a minimum value
>> (e.g. 1 MB) after the system is initially proven to work, until problems
>> come up agains.
>>
>> * * * * "printenv" (when typed at the OBP level) shows all the
>> environment variables which set lots of things which take effect at boot
>> time. *You can see these all by typing (from a booted system) "eeprom".
>>

>
> I thought it goes without saying that you can only pass/fail what you
> test so I didn't say it . But I did check, and I don't have the
> "selftest-#megs" variable nor do I have the "diag-passes" variable
> which seems like a nifty thing to have.


O.K. The Sun Fire V120, and the Sun Fire 280R/Blade-[12]000
have diag-passes, but not "selftest-#mags", while the SPARCstation 5 has
the selftest-#megs, but not the diag-passes. Those are the only systems
which are running at the moment (several examples of each), and I can't
remember what older systems had. The two SS-5s are approaching
retirement now.

>>
>> * * * * Note, BTW, that the weird characters in there are part of the
>> variable names, and indicate that the contents are something other than
>> a plain string. *A '?' indicates that it is expecting either "true" or
>> "false", and a '#' indicates that it is expecting a number value.
>>

>
> Makes sense. There's also that thing where you have to spell "True"
> with an u/c "T" and "false" with a l/c "f" or vice versa which I guess
> is their way of verifying intent.


Hmm ... none of the systems which I have show anything other
than lower case for both "true" and "false". Did I perhaps add that
case distinction when I was typing -- or could it be that your system
has it and mine does not (different versions of OPB).

>>
>> > How about another approach. *It turns out I have SunVTS on this thing
>> > (ver 4.0 to
>> > go w/ Solaris 8). *In multiuser mode, under CDE I'm running a
>> > "functional mode"
>> > test with pmemtest as well as all the processor tests. *I plan on
>> > running this test
>> > overnight. *If I'm trying to smoke out bad memory, is this a good way
>> > to do it?

>>
>> * * * * I don't find "pmemtest" on my Solaris 10. *Nor do I find it on
>> systems running Solaris 2.6, so I have no idea what it really does, but
>> I suspect that it can't touch the memory in which the kernel is running.
>>

>
> On sunvts 4.0, which I ran under the CDE environment, using the
> "logical" system map you can expand the "memory" item and select
> "vmemtest" which tests physical memory and swap or you can select
> "pmemtest" which test only physical memory. The setup dialog for the
> pmemtest says you can test 100% of the memory but the doc says its a
> read-only test so I don't know how useful that is. From looking at
> the docs it appears sunvts 6.0 (which is what Solaris 10 uses) is set
> up the same way.


O.K. Would "sunvts" be in /opt, something like: "SUNWvts"
perhaps? If so, I don't have it. Did it come from the distribution
CD-ROM, or from a service contract? I don't have the latter, so I am
unlikely to have it.

> Last week, after being away for two weeks, I ran the whole bunch of
> tests; vts, memtester, POST and now everything checks out fine, no
> errors, no nothing. It's baffling but what can you do?


Compare room temperature when failures occur vs when they
don't?

> Don, you've been outrageously helpful with this and also with that
> video card issue I had. Thank you very much.


Glad to help as I can. I've been playing with these systems
since I got my first Sun 2/120 -- and had the chance to work as a
Sysadmin for a few years before I retired.

Good Luck,
DoN.

--
Remove oil spill source from e-mail
Email: <(E-Mail Removed)> | Voice (all times): (703) 938-4564
(too) near Washington D.C. | http://www.d-and-d.com/dnichols/DoN.html
--- Black Holes are where God is dividing by zero ---
 
Reply With Quote
 
Eric
Guest
Posts: n/a
 
      07-21-2011, 03:39 PM
On Jul 20, 10:31*pm, "DoN. Nichols" <(E-Mail Removed)> wrote:
> On 2011-07-20, Eric <(E-Mail Removed)> wrote:
>
>
> >> * * * * Note, BTW, that the weird characters in there are partof the
> >> variable names, and indicate that the contents are something other than
> >> a plain string. *A '?' indicates that it is expecting either "true" or
> >> "false", and a '#' indicates that it is expecting a number value.

>
> > Makes sense. *There's also that thing where you have to spell "True"
> > with an u/c "T" and "false" with a l/c "f" or vice versa which I guess
> > is their way of verifying intent.

>
> * * * * Hmm ... none of the systems which I have show anything other
> than lower case for both "true" and "false". *Did I perhaps add that
> case distinction when I was typing -- or could it be that your system
> has it and mine does not (different versions of OPB).
>


Probably differences in OBP. My u/c, l/c comment was based on my
recent experience. If I typed in the switch with the wrong case OB
wouldn't accept it. I think one of the Ultras has an OBP version of
3.25 and the other is version 3.33, I think. I have a third,
production, machine which I think is even older, I want to say 3.15
but I really don't recall.

>
>
> >> > How about another approach. *It turns out I have SunVTS on this thing
> >> > (ver 4.0 to
> >> > go w/ Solaris 8). *In multiuser mode, under CDE I'm running a
> >> > "functional mode"
> >> > test with pmemtest as well as all the processor tests. *I plan on
> >> > running this test
> >> > overnight. *If I'm trying to smoke out bad memory, is this a good way
> >> > to do it?

>
> >> * * * * I don't find "pmemtest" on my Solaris 10. *Nor do I find it on
> >> systems running Solaris 2.6, so I have no idea what it really does, but
> >> I suspect that it can't touch the memory in which the kernel is running.

>
> > On sunvts 4.0, which I ran under the CDE environment, using the
> > "logical" system map you can expand the "memory" item and select
> > "vmemtest" which tests physical memory and swap or you can select
> > "pmemtest" which test only physical memory. *The setup dialog for the
> > pmemtest says you can test 100% of the memory but the doc says its a
> > read-only test so I don't know how useful that is. *From looking at
> > the docs it appears sunvts 6.0 (which is what Solaris 10 uses) is set
> > up the same way.

>
> * * * * O.K. Would "sunvts" be in /opt, something like: "SUNWvts"
> perhaps? *If so, I don't have it. *Did it come from the distribution
> CD-ROM, or from a service contract? *I don't have the latter, so I am
> unlikely to have it.
>


Exactly. The Ultras were supplied to us by our instrument's
manufacturer (we have a mass spec that uses an Ultra for acquisition
and control). I'm not sure where they got it, might be a separate add-
on. And unfortunately it seems you can't download anything from
Oracle unless you have a contract, which really blows.

> > Last week, after being away for two weeks, I ran the whole bunch of
> > tests; vts, memtester, POST and now everything checks out fine, no
> > errors, no nothing. *It's baffling but what can you do?

>
> * * * * Compare room temperature when failures occur vs when they
> don't?
>


Good point. The place where I had failures is warmer than where I'm
testing now but it's only by, I'm guessing, 5 deg F or so. Yeah,
that's interesting and it's giving me a kind of sick feeling.



Thanks,
eric
 
Reply With Quote
 
DoN. Nichols
Guest
Posts: n/a
 
      07-21-2011, 10:47 PM
On 2011-07-21, Eric <(E-Mail Removed)> wrote:
> On Jul 20, 10:31*pm, "DoN. Nichols" <(E-Mail Removed)> wrote:


[ ... ]

>> * * * * Hmm ... none of the systems which I have show anything other
>> than lower case for both "true" and "false". *Did I perhaps add that
>> case distinction when I was typing -- or could it be that your system
>> has it and mine does not (different versions of OPB).
>>

>
> Probably differences in OBP. My u/c, l/c comment was based on my
> recent experience. If I typed in the switch with the wrong case OB
> wouldn't accept it. I think one of the Ultras has an OBP version of
> 3.25 and the other is version 3.33, I think. I have a third,
> production, machine which I think is even older, I want to say 3.15
> but I really don't recall.


O.K. My Sun Blade 1000 and 2000 system have:

OBP 4.16.4 2004/12/18 05:18

The Sun Fire V120 has:

CORE 1.0.17 2003/10/06 17:09

And the SS-5 has:

OBP 4.16.4 2004/12/18 05:18
POST 4.16.3 2004/11/05 20:02

[ ... ]

>> * * * * O.K. Would "sunvts" be in /opt, something like: "SUNWvts"
>> perhaps? *If so, I don't have it. *Did it come from the distribution
>> CD-ROM, or from a service contract? *I don't have the latter, so I am
>> unlikely to have it.
>>

>
> Exactly. The Ultras were supplied to us by our instrument's
> manufacturer (we have a mass spec that uses an Ultra for acquisition
> and control). I'm not sure where they got it, might be a separate add-
> on. And unfortunately it seems you can't download anything from
> Oracle unless you have a contract, which really blows.


Agreed. Having them take over Sun was a disaster.

>> > Last week, after being away for two weeks, I ran the whole bunch of
>> > tests; vts, memtester, POST and now everything checks out fine, no
>> > errors, no nothing. *It's baffling but what can you do?

>>
>> * * * * Compare room temperature when failures occur vs when they
>> don't?
>>

>
> Good point. The place where I had failures is warmer than where I'm
> testing now but it's only by, I'm guessing, 5 deg F or so. Yeah,
> that's interesting and it's giving me a kind of sick feeling.


It may simply argue for more air conditioning where they are
normally used -- assuming that is an option.

Good Luck,
DoN.

--
Remove oil spill source from e-mail
Email: <(E-Mail Removed)> | Voice (all times): (703) 938-4564
(too) near Washington D.C. | http://www.d-and-d.com/dnichols/DoN.html
--- Black Holes are where God is dividing by zero ---
 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
A7N8X-E Deluxe Dual Channel = Bad RAM? (or two modules = bad RAM?) Erik Harris Asus 4 11-21-2004 06:30 PM
Re: Missing TIP features in SP2 - BAD, BAD, BAD Chris H. Tablet PC 10 09-10-2004 07:45 AM
Bad bad RAM, now laptop has annoying delay! G Tom Laptops 4 06-24-2004 04:27 PM
Bad RAM? or Dead RAM socket? running Toshiba laptop, Windows XP Edward A. Laptops 1 01-02-2004 06:56 AM
Sun ships us bad RAM, says won't replace because it's not bad enough Baby Peanut Sun Hardware 13 11-13-2003 05:44 PM


All times are GMT. The time now is 01:08 PM.


Welcome!
Welcome to Motherboard Point
 

Advertisment