I enabled ECC after booting Linux on Athlon64

Discussion in 'Asus' started by =?ISO-8859-1?Q?Jaakko_Hyv=E4tti?=, Jun 21, 2005.

  1. Hi all,

    Until now, I have thought that it is almost impossible to
    enable ECC functionality after the BIOS has booted the
    computer. Many people in the ECC mailing list have
    confirmed that, and I have not known of anyone even
    attempting that seriously. The problem is that all of
    memory has to be written to with correct checksum mode
    enabled before the checking and correction of errors can be
    enabled. This is usually done in BIOS, as there are no
    other processes running.

    Today, 2005-06-21, I noticed however that using the
    Athlon64 hardware scrubbing feature allows the memory to be
    initialized completely before enabling the checking of
    memory reads. The scripts below have worked for me, and I
    have verified that the correct ECC modes are enabled (with
    ecc.pl), and that single bit errors introduced by grounding
    memory data leads result in correct reports in CPU
    North bridge registers.

    My motherboard is Asus A8N-SLI Deluxe, which would be a
    perfect computational node building block if the ECC was
    correctly initialized in BIOS. It is not. Maybe this helps.

    These scripts are at http://www.iki.fi/hyvatti/sw/ :

    * ecc-start.sh enables the Athlon64 and Opteron 64+8 bit
    ECC in runtime.

    * ecc-chip-kill-start.sh enables Athlon64 and Opteron
    128+16 bit Chip-Kill ECC in runtime. You must have a
    128 bit memory configuration for this to work.

    * ecc-stop.sh Stops the ECC operations.

    * ecc.pl reads AMD64 CPU integrated north bridge registers
    and displays them with names and descriptions bit by
    bit. Latest version 2005-06-21 includes tables for ECC
    syndrome codes. Their usage in decoding registers is
    however unimplemented.

    Testing the ECC functionality can be done with following
    steps:

    1. Use ecc-start.sh or ecc-chip-kill-start.sh.

    2. Run ecc.pl to see that ECC is enabled and that the
    syndrome registers are zero.

    3. Short circuit a data line or preferably a parity bit
    data line on one of the DDR memory modules with ground
    for a short period. For example pin 49 (parity bit 2,
    ie. bit 66) and pin 51 (parity bit 3 = bit 67) are
    fine. The pin between them is Ground. Count the pins
    from a DIMM where number 1 pin is marked. It is easy to
    stuff a lead to the DIMM socket into the holes next to
    the socket pins. I used 10 ohm resistor to make the
    probability of damage smaller. However I am not
    responsible for any damage you cause for your computer.
    It can break completely.

    4. Run ecc.pl to see if the syndrome codes are there. If
    not, repeat the previous step.

    5. Decode the syndrome code with AMD document 26094.PDF or
    the table in ecc.pl. See if it corresponds with the
    data lead you just shorted.


    I hope this information is of use to at least some bluesmoke and ecc
    users. Please let me know if you find out anything more on this subject
    or can verify my results. Maybe this code can be included in the
    bluesmoke distribution in some form.

    Regards,
    Jaakko
     
    =?ISO-8859-1?Q?Jaakko_Hyv=E4tti?=, Jun 21, 2005
    #1
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.