1. This forum section is a read-only archive which contains old newsgroup posts. If you wish to post a query, please do so in one of our main forum sections (here). This way you will get a faster, better response from the members on Motherboard Point.

Bad RAM or not?

Discussion in 'Sun Hardware' started by Eric, Jun 21, 2011.

  1. Eric

    Eric Guest

    How reliable is 'memtester'?

    I have an Ultra 10 running Sol 8 and memtester is failing towards the
    end of the first cycle (don't have the exact error message). However
    the memory seems to be passing POST. Some guy had me run the full
    diagnostics (setenv diag-level max; setenv diag-switch? True) and it
    passed that too.

    As if that weren't enough, I have another Ultra 10 (same Solaris) that
    reports a memory error on boot (looks like during boot-up not POST)
    but has no problems with memtester, ran a few dozen cycles with no
    trouble.

    So my questions: Is memtester any good? Even within its limitation
    of only being able to test free memory, is it reliable?
    Alternatively, should I accept the POST results as gospel and s-can
    memtester?

    TIA,
    eric
     
    Eric, Jun 21, 2011
    #1
    1. Advertising

  2. Eric

    Winston Guest

    Eric <> writes:
    > How reliable is 'memtester'?


    Sorry, I have no idea.

    > I have an Ultra 10 running Sol 8 and memtester is failing towards the
    > end of the first cycle (don't have the exact error message). However
    > the memory seems to be passing POST. Some guy had me run the full
    > diagnostics (setenv diag-level max; setenv diag-switch? True) and it
    > passed that too.


    Do printenv have "selftest-#megs"? If so, does it match the machine's
    memory size? If it's smaller, POST is only testing part of the memory,
    and if the tested part of memory is fine, POST passes.

    > I have another Ultra 10 ...


    The boot (EE)PROM should have a memory test diagnostic, with more
    options than the POST test. Try running that on the questionable memory.
    -WBE
     
    Winston, Jun 22, 2011
    #2
    1. Advertising

  3. Eric

    Eric Guest

    On Jun 21, 8:10 pm, (Winston) wrote:
    >
    >    Do printenv have "selftest-#megs"?  If so, does it match the machine's
    > memory size?  If it's smaller, POST is only testing part of the memory,
    > and if the tested part of memory is fine, POST passes.
    >

    I have no idea what you mean by "printenv" or "selftest-#megs". I'll
    have to dig
    through my OpenBoot doc. I'm not seeing anything grossly wrong and
    when
    the Ultra goes through the extended diagnostics it says everything
    passed.

    >
    >    The boot (EE)PROM should have a memory test diagnostic, with more
    > options than the POST test.  Try running that on the questionable memory.
    >  -WBE

    I guess my language is sloppy, when I refer to POST, I mean everything
    that
    happens from the time you press the ON button until the time the Ultra
    tries
    to boot off of something. In that context, an extended diagnostic was
    run
    (I'm pretty sure).

    How about another approach. It turns out I have SunVTS on this thing
    (ver 4.0 to
    go w/ Solaris 8). In multiuser mode, under CDE I'm running a
    "functional mode"
    test with pmemtest as well as all the processor tests. I plan on
    running this test
    overnight. If I'm trying to smoke out bad memory, is this a good way
    to do it?

    TIA,
    eric
     
    Eric, Jul 7, 2011
    #3
  4. Eric

    DoN. Nichols Guest

    On 2011-07-07, Eric <> wrote:
    > On Jun 21, 8:10 pm, (Winston) wrote:
    >>
    >>    Do printenv have "selftest-#megs"?  If so, does it match the machine's
    >> memory size?  If it's smaller, POST is only testing part of the memory,
    >> and if the tested part of memory is fine, POST passes.
    >>

    > I have no idea what you mean by "printenv" or "selftest-#megs". I'll
    > have to dig
    > through my OpenBoot doc. I'm not seeing anything grossly wrong and
    > when
    > the Ultra goes through the extended diagnostics it says everything
    > passed.


    Everything that it *tested* passed. The setting of the
    "selftest-#megs" variable (if present) will control how much memory is
    tested. With a slow system and a lot of RAM, it can slow boot
    significantly, so it is common practice to set it to a minimum value
    (e.g. 1 MB) after the system is initially proven to work, until problems
    come up agains.

    "printenv" (when typed at the OBP level) shows all the
    environment variables which set lots of things which take effect at boot
    time. You can see these all by typing (from a booted system) "eeprom".

    Some clips from mine (on a SunBlade 2000) shows:

    ======================================================================
    diag-passes=1
    enclosure-type=540-3256-15
    banner-name=Sun Blade 2000/1000
    energystar-enabled?=true
    pcia-probe-list=4,1

    [ ... ]

    ansi-terminal?=true
    screen-#columns=80
    screen-#rows=34
    ttyb-rts-dtr-off=false

    [ ... ]

    security-#badlogins=0
    #power-cycles=48
    diag-script=none
    diag-level=min
    diag-switch?=false
    error-reset-recovery=boot
    ======================================================================

    This one does not have the "selftest-#megs" option, or if it
    does, it is only visible from the OBP level.

    But a SS-5 (much older machine, of course) has it:

    ======================================================================
    screen-#rows=34
    selftest-#megs=1
    scsi-initiator-id=7
    ======================================================================

    If it is present, it defines how much memory is tested during the normal
    POST.

    Note, BTW, that the weird characters in there are part of the
    variable names, and indicate that the contents are something other than
    a plain string. A '?' indicates that it is expecting either "true" or
    "false", and a '#' indicates that it is expecting a number value.

    These *must* be present. No problem at the OBP level, but from
    a booted system, the shells have special meanings for '?' and '#', so
    you have to put the whole line in double quotes, or put '\' in front of
    each such character.

    And -- only root can change the settings from a booted system
    using the "eeprom" command, though anybody can *look* at them.

    And from the OBP you use the "setenv" command, without an '='
    sign (but with a space), while from the eeprom command from a booted
    system, it will be something like:

    eeprom "selftest-#megs=40"

    Or whatever value represents the whole of memory.

    The things which control the boot time diagnostics on my SB-2000 are:

    ======================================================================
    diag-passes=1
    diag-file: data not available.
    diag-device=disk:f
    diag-script=none
    diag-level=min
    diag-switch?=false
    ======================================================================

    Currently, they are set to bypass the majority of the tests, and will
    remain so until I start having troubles with the system.

    >>
    >>    The boot (EE)PROM should have a memory test diagnostic, with more
    >> options than the POST test.  Try running that on the questionable memory.
    >>  -WBE

    > I guess my language is sloppy, when I refer to POST, I mean everything
    > that
    > happens from the time you press the ON button until the time the Ultra
    > tries
    > to boot off of something. In that context, an extended diagnostic was
    > run
    > (I'm pretty sure).


    That is where the "selftest-#megs" value (if present) limits the
    amount of memory that is tested.

    There are other test options which can be only invoked from the
    OBP prompt.

    > How about another approach. It turns out I have SunVTS on this thing
    > (ver 4.0 to
    > go w/ Solaris 8). In multiuser mode, under CDE I'm running a
    > "functional mode"
    > test with pmemtest as well as all the processor tests. I plan on
    > running this test
    > overnight. If I'm trying to smoke out bad memory, is this a good way
    > to do it?


    I don't find "pmemtest" on my Solaris 10. Nor do I find it on
    systems running Solaris 2.6, so I have no idea what it really does, but
    I suspect that it can't touch the memory in which the kernel is running.

    However -- the "diag-level" (in the OPB) can be set to "max",and
    the "diag-switch?" set to "true" for the maximum test during the boot
    process. Or -- you can invoke certain tests from the command line in
    OBP. Try something like "test ?" to get a list of test options.

    Good Luck,
    DoN.

    --
    Remove oil spill source from e-mail
    Email: <> | Voice (all times): (703) 938-4564
    (too) near Washington D.C. | http://www.d-and-d.com/dnichols/DoN.html
    --- Black Holes are where God is dividing by zero ---
     
    DoN. Nichols, Jul 8, 2011
    #4
  5. Eric

    Eric Guest

    On Jul 7, 9:35 pm, "DoN. Nichols" <> wrote:
    > On 2011-07-07, Eric <> wrote:
    >
    > > On Jun 21, 8:10 pm, (Winston) wrote:

    >
    > >> Do printenv have "selftest-#megs"? If so, does it match the machine's
    > >> memory size? If it's smaller, POST is only testing part of the memory,
    > >> and if the tested part of memory is fine, POST passes.

    >
    > > I have no idea what you mean by "printenv" or "selftest-#megs".  I'll
    > > have to dig
    > > through my OpenBoot doc.  I'm not seeing anything grossly wrong and
    > > when
    > > the Ultra goes through the extended diagnostics it says everything
    > > passed.

    >
    >         Everything that it *tested* passed.  The setting of the
    > "selftest-#megs" variable (if present) will control how much memory is
    > tested.  With a slow system and a lot of RAM, it can slow boot
    > significantly, so it is common practice to set it to a minimum value
    > (e.g. 1 MB) after the system is initially proven to work, until problems
    > come up agains.
    >
    >         "printenv" (when typed at the OBP level) shows all the
    > environment variables which set lots of things which take effect at boot
    > time.  You can see these all by typing (from a booted system) "eeprom".
    >


    I thought it goes without saying that you can only pass/fail what you
    test so I didn't say it :). But I did check, and I don't have the
    "selftest-#megs" variable nor do I have the "diag-passes" variable
    which seems like a nifty thing to have.

    >
    >         Note, BTW, that the weird characters in there are part ofthe
    > variable names, and indicate that the contents are something other than
    > a plain string.  A '?' indicates that it is expecting either "true" or
    > "false", and a '#' indicates that it is expecting a number value.
    >


    Makes sense. There's also that thing where you have to spell "True"
    with an u/c "T" and "false" with a l/c "f" or vice versa which I guess
    is their way of verifying intent.

    >
    > > How about another approach.  It turns out I have SunVTS on this thing
    > > (ver 4.0 to
    > > go w/ Solaris 8).  In multiuser mode, under CDE I'm running a
    > > "functional mode"
    > > test with pmemtest as well as all the processor tests.  I plan on
    > > running this test
    > > overnight.  If I'm trying to smoke out bad memory, is this a good way
    > > to do it?

    >
    >         I don't find "pmemtest" on my Solaris 10.  Nor do I find it on
    > systems running Solaris 2.6, so I have no idea what it really does, but
    > I suspect that it can't touch the memory in which the kernel is running.
    >


    On sunvts 4.0, which I ran under the CDE environment, using the
    "logical" system map you can expand the "memory" item and select
    "vmemtest" which tests physical memory and swap or you can select
    "pmemtest" which test only physical memory. The setup dialog for the
    pmemtest says you can test 100% of the memory but the doc says its a
    read-only test so I don't know how useful that is. From looking at
    the docs it appears sunvts 6.0 (which is what Solaris 10 uses) is set
    up the same way.

    Last week, after being away for two weeks, I ran the whole bunch of
    tests; vts, memtester, POST and now everything checks out fine, no
    errors, no nothing. It's baffling but what can you do?

    Don, you've been outrageously helpful with this and also with that
    video card issue I had. Thank you very much.

    eric
     
    Eric, Jul 20, 2011
    #5
  6. Eric

    DoN. Nichols Guest

    On 2011-07-20, Eric <> wrote:
    > On Jul 7, 9:35 pm, "DoN. Nichols" <> wrote:
    >> On 2011-07-07, Eric <> wrote:
    >>
    >> > On Jun 21, 8:10 pm, (Winston) wrote:


    [ ... ]

    >>         Everything that it *tested* passed.  The setting of the
    >> "selftest-#megs" variable (if present) will control how much memory is
    >> tested.  With a slow system and a lot of RAM, it can slow boot
    >> significantly, so it is common practice to set it to a minimum value
    >> (e.g. 1 MB) after the system is initially proven to work, until problems
    >> come up agains.
    >>
    >>         "printenv" (when typed at the OBP level) shows all the
    >> environment variables which set lots of things which take effect at boot
    >> time.  You can see these all by typing (from a booted system) "eeprom".
    >>

    >
    > I thought it goes without saying that you can only pass/fail what you
    > test so I didn't say it :). But I did check, and I don't have the
    > "selftest-#megs" variable nor do I have the "diag-passes" variable
    > which seems like a nifty thing to have.


    O.K. The Sun Fire V120, and the Sun Fire 280R/Blade-[12]000
    have diag-passes, but not "selftest-#mags", while the SPARCstation 5 has
    the selftest-#megs, but not the diag-passes. Those are the only systems
    which are running at the moment (several examples of each), and I can't
    remember what older systems had. The two SS-5s are approaching
    retirement now.

    >>
    >>         Note, BTW, that the weird characters in there are part of the
    >> variable names, and indicate that the contents are something other than
    >> a plain string.  A '?' indicates that it is expecting either "true" or
    >> "false", and a '#' indicates that it is expecting a number value.
    >>

    >
    > Makes sense. There's also that thing where you have to spell "True"
    > with an u/c "T" and "false" with a l/c "f" or vice versa which I guess
    > is their way of verifying intent.


    Hmm ... none of the systems which I have show anything other
    than lower case for both "true" and "false". Did I perhaps add that
    case distinction when I was typing -- or could it be that your system
    has it and mine does not (different versions of OPB).

    >>
    >> > How about another approach.  It turns out I have SunVTS on this thing
    >> > (ver 4.0 to
    >> > go w/ Solaris 8).  In multiuser mode, under CDE I'm running a
    >> > "functional mode"
    >> > test with pmemtest as well as all the processor tests.  I plan on
    >> > running this test
    >> > overnight.  If I'm trying to smoke out bad memory, is this a good way
    >> > to do it?

    >>
    >>         I don't find "pmemtest" on my Solaris 10.  Nor do I find it on
    >> systems running Solaris 2.6, so I have no idea what it really does, but
    >> I suspect that it can't touch the memory in which the kernel is running.
    >>

    >
    > On sunvts 4.0, which I ran under the CDE environment, using the
    > "logical" system map you can expand the "memory" item and select
    > "vmemtest" which tests physical memory and swap or you can select
    > "pmemtest" which test only physical memory. The setup dialog for the
    > pmemtest says you can test 100% of the memory but the doc says its a
    > read-only test so I don't know how useful that is. From looking at
    > the docs it appears sunvts 6.0 (which is what Solaris 10 uses) is set
    > up the same way.


    O.K. Would "sunvts" be in /opt, something like: "SUNWvts"
    perhaps? If so, I don't have it. Did it come from the distribution
    CD-ROM, or from a service contract? I don't have the latter, so I am
    unlikely to have it.

    > Last week, after being away for two weeks, I ran the whole bunch of
    > tests; vts, memtester, POST and now everything checks out fine, no
    > errors, no nothing. It's baffling but what can you do?


    Compare room temperature when failures occur vs when they
    don't?

    > Don, you've been outrageously helpful with this and also with that
    > video card issue I had. Thank you very much.


    Glad to help as I can. I've been playing with these systems
    since I got my first Sun 2/120 -- and had the chance to work as a
    Sysadmin for a few years before I retired.

    Good Luck,
    DoN.

    --
    Remove oil spill source from e-mail
    Email: <> | Voice (all times): (703) 938-4564
    (too) near Washington D.C. | http://www.d-and-d.com/dnichols/DoN.html
    --- Black Holes are where God is dividing by zero ---
     
    DoN. Nichols, Jul 21, 2011
    #6
  7. Eric

    Eric Guest

    On Jul 20, 10:31 pm, "DoN. Nichols" <> wrote:
    > On 2011-07-20, Eric <> wrote:
    >
    >
    > >>         Note, BTW, that the weird characters in there are partof the
    > >> variable names, and indicate that the contents are something other than
    > >> a plain string.  A '?' indicates that it is expecting either "true" or
    > >> "false", and a '#' indicates that it is expecting a number value.

    >
    > > Makes sense.  There's also that thing where you have to spell "True"
    > > with an u/c "T" and "false" with a l/c "f" or vice versa which I guess
    > > is their way of verifying intent.

    >
    >         Hmm ... none of the systems which I have show anything other
    > than lower case for both "true" and "false".  Did I perhaps add that
    > case distinction when I was typing -- or could it be that your system
    > has it and mine does not (different versions of OPB).
    >


    Probably differences in OBP. My u/c, l/c comment was based on my
    recent experience. If I typed in the switch with the wrong case OB
    wouldn't accept it. I think one of the Ultras has an OBP version of
    3.25 and the other is version 3.33, I think. I have a third,
    production, machine which I think is even older, I want to say 3.15
    but I really don't recall.

    >
    >
    > >> > How about another approach.  It turns out I have SunVTS on this thing
    > >> > (ver 4.0 to
    > >> > go w/ Solaris 8).  In multiuser mode, under CDE I'm running a
    > >> > "functional mode"
    > >> > test with pmemtest as well as all the processor tests.  I plan on
    > >> > running this test
    > >> > overnight.  If I'm trying to smoke out bad memory, is this a good way
    > >> > to do it?

    >
    > >>         I don't find "pmemtest" on my Solaris 10.  Nor do I find it on
    > >> systems running Solaris 2.6, so I have no idea what it really does, but
    > >> I suspect that it can't touch the memory in which the kernel is running.

    >
    > > On sunvts 4.0, which I ran under the CDE environment, using the
    > > "logical" system map you can expand the "memory" item and select
    > > "vmemtest" which tests physical memory and swap or you can select
    > > "pmemtest" which test only physical memory.  The setup dialog for the
    > > pmemtest says you can test 100% of the memory but the doc says its a
    > > read-only test so I don't know how useful that is.  From looking at
    > > the docs it appears sunvts 6.0 (which is what Solaris 10 uses) is set
    > > up the same way.

    >
    >         O.K. Would "sunvts" be in /opt, something like: "SUNWvts"
    > perhaps?  If so, I don't have it.  Did it come from the distribution
    > CD-ROM, or from a service contract?  I don't have the latter, so I am
    > unlikely to have it.
    >


    Exactly. The Ultras were supplied to us by our instrument's
    manufacturer (we have a mass spec that uses an Ultra for acquisition
    and control). I'm not sure where they got it, might be a separate add-
    on. And unfortunately it seems you can't download anything from
    Oracle unless you have a contract, which really blows.

    > > Last week, after being away for two weeks, I ran the whole bunch of
    > > tests; vts, memtester, POST and now everything checks out fine, no
    > > errors, no nothing.  It's baffling but what can you do?

    >
    >         Compare room temperature when failures occur vs when they
    > don't?
    >


    Good point. The place where I had failures is warmer than where I'm
    testing now but it's only by, I'm guessing, 5 deg F or so. Yeah,
    that's interesting and it's giving me a kind of sick feeling.



    Thanks,
    eric
     
    Eric, Jul 21, 2011
    #7
  8. Eric

    DoN. Nichols Guest

    On 2011-07-21, Eric <> wrote:
    > On Jul 20, 10:31 pm, "DoN. Nichols" <> wrote:


    [ ... ]

    >>         Hmm ... none of the systems which I have show anything other
    >> than lower case for both "true" and "false".  Did I perhaps add that
    >> case distinction when I was typing -- or could it be that your system
    >> has it and mine does not (different versions of OPB).
    >>

    >
    > Probably differences in OBP. My u/c, l/c comment was based on my
    > recent experience. If I typed in the switch with the wrong case OB
    > wouldn't accept it. I think one of the Ultras has an OBP version of
    > 3.25 and the other is version 3.33, I think. I have a third,
    > production, machine which I think is even older, I want to say 3.15
    > but I really don't recall.


    O.K. My Sun Blade 1000 and 2000 system have:

    OBP 4.16.4 2004/12/18 05:18

    The Sun Fire V120 has:

    CORE 1.0.17 2003/10/06 17:09

    And the SS-5 has:

    OBP 4.16.4 2004/12/18 05:18
    POST 4.16.3 2004/11/05 20:02

    [ ... ]

    >>         O.K. Would "sunvts" be in /opt, something like: "SUNWvts"
    >> perhaps?  If so, I don't have it.  Did it come from the distribution
    >> CD-ROM, or from a service contract?  I don't have the latter, so I am
    >> unlikely to have it.
    >>

    >
    > Exactly. The Ultras were supplied to us by our instrument's
    > manufacturer (we have a mass spec that uses an Ultra for acquisition
    > and control). I'm not sure where they got it, might be a separate add-
    > on. And unfortunately it seems you can't download anything from
    > Oracle unless you have a contract, which really blows.


    Agreed. Having them take over Sun was a disaster.

    >> > Last week, after being away for two weeks, I ran the whole bunch of
    >> > tests; vts, memtester, POST and now everything checks out fine, no
    >> > errors, no nothing.  It's baffling but what can you do?

    >>
    >>         Compare room temperature when failures occur vs when they
    >> don't?
    >>

    >
    > Good point. The place where I had failures is warmer than where I'm
    > testing now but it's only by, I'm guessing, 5 deg F or so. Yeah,
    > that's interesting and it's giving me a kind of sick feeling.


    It may simply argue for more air conditioning where they are
    normally used -- assuming that is an option.

    Good Luck,
    DoN.

    --
    Remove oil spill source from e-mail
    Email: <> | Voice (all times): (703) 938-4564
    (too) near Washington D.C. | http://www.d-and-d.com/dnichols/DoN.html
    --- Black Holes are where God is dividing by zero ---
     
    DoN. Nichols, Jul 21, 2011
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Erik Harris
    Replies:
    4
    Views:
    413
  2. Edward A.
    Replies:
    1
    Views:
    766
    Alex Quant
    Jan 2, 2004
  3. G Tom
    Replies:
    4
    Views:
    289
    G Tom
    Jun 24, 2004
  4. Baby Peanut
    Replies:
    13
    Views:
    739
    Marcin Dobrucki
    Nov 13, 2003
  5. Chris H.
    Replies:
    10
    Views:
    422
    Avner Ben
    Sep 10, 2004
Loading...

Share This Page