Need sites, detailed explanation of Dual Channel

Discussion in 'Asus' started by signmeuptoo, Jun 1, 2005.

  1. signmeuptoo

    signmeuptoo Guest

    I have been asked about how dual channel RAM works, and I have found myself
    at a loss, realizing that I really don't know what it is, exactly how it
    works, especially (also) with the AMD On Chip memory controller.

    Could someone explain it to me well, and is there a good website or two
    explaining the topic?

    TIA!
     
    signmeuptoo, Jun 1, 2005
    #1
    1. Advertisements

  2. signmeuptoo

    RonK Guest

    RonK, Jun 1, 2005
    #2
    1. Advertisements

  3. signmeuptoo

    Paul Guest

    I'm not sure "Dual Channel" is a technically precise enough word,
    that a defensible description can be made for it in all situations.
    Perhaps one of the architecture groups would be a better place for
    the question, than a motherboard group.

    One level of distinction I guess I would start with, is the
    width of the processor bus, and the width of individual memory
    DIMMs. The P4 and the AthlonXP have 64 bit external interfaces to
    whatever is used for the Northbridge. A DIMM is also 64 bits wide.

    It is a no brainer figuring out how to connect a 64 bit device
    to another 64 bit device. (We'll ignore how rate matching is
    done when the FSB and memory are running async.)

    Now, to use more than one DIMM at a time, requires a couple of
    things. Generally, there is a relationship between cache organization,
    and the size of data objects fetched from main memory. Generally,
    I notice that the size of a data fetch from the memory controller,
    is not larger than a single cache line. So, to profitably use
    dual channel memory, also requires that the cache be sized to handle
    what is coming from main memory. (In other words, not just any
    processor can benefit from a dual channel organization, just as
    if I tried to glue quad or octal channels on today's processors,
    nothing good would happen. The performance would suck, as most
    of the time spent on each channel would be wasted on overhead.)

    If I want to use two channels, I have to figure out how to organize
    (interleave) memory addresses. For example, I could have bytes 0-7
    on one DIMM, then bytes 8-15 on the second DIMM. When I go back to the
    first DIMM again, I'd be looking for bytes 16-23 and so on. The
    overhead of setting up a memory operation on a DIMM is large enough,
    that a burst of data is requested from the memory. The processor will
    be doing cache line sized operations to the memory, because to work
    on a single byte somewhere in main memory would be a super-expensive
    way to do business.

    So, what makes a situation "Dual Channel" ? These are the
    attributes that come to mind.

    1) The memory channels are autonomous. In other words, I can plug a
    DIMM into channel0. Or, I can plug a DIMM into channel1. The
    channels have equal significance, and one channel is
    indistinguishable from the other.

    2) When a cache line sized operation is required by the processor,
    the two channels alternate supplying information to the
    processor. This allows, say, the 3.2GB/sec theoretical bandwidth
    of a DDR400 DIMM to be added to the 3.2GB/sec bandwidth of a
    second DIMM.

    3) How flexible the memory controller is, in terms of mixing DIMMs,
    doesn't influence the "Dual Channel" moniker. Some dual channel
    controllers allow mixing DIMMs on each channel, and operate in
    interleaved mode, as long as there are equal quantities of RAM
    on each channel. Others require exact matched DIMMs, perhaps
    placed in particular slots. None of that matters to this
    discussion.

    So, I guess the key distinguishing feature, is the ability to
    interleave memory addresses, storing bytes 0-7, 16-23, 32-39 in
    one channel, and 8-15, 24-31, 40-47 in the other channel.

    Now, in all of this, I didn't mention "rate matching". If the
    bandwidth is not balanced between the processor and the memory,
    then data could "pile up" or "run dry". A well matched situation
    would be a FSB800 processor (6.4GB/sec bus interface) talking to
    two DDR400 (3.2GB/sec) DIMMs. Some kind of control mechanism
    must be in place, to pace the data. Either a FIFO queue temporarily
    holds the data somewhere, or the individual words of data in the
    burst are occasionally delayed by a cycle, until the consumer of
    the data is ready for that data word. This is outside the discussion
    of dual channel (and for motherboards, I doubt I could answer the
    question anyway).

    There are a couple of examples we could go through, to see some
    of the issues around dual channel. First, we'll start with the
    Nforce2.

    On the Nforce2, if we set up the system to be synchronous, in
    fact the memory subsystem has twice the theoretical bandwidth
    of the processor interface. And, that means dual channel will be
    a waste, in the sense that on a read, the burst of data from
    the Northbridge must be slowed to match the processor. That is
    why you don't see a big difference between single channel and
    dual channel mode in synchronous operation. (The excess of
    data is still handy though, because a memory bus is seldom
    100% efficient, and dead cycles occasionally happen on a channel.
    The dual channals might have data to transfer, due to the
    excess. So a slot might get filled that would otherwise be
    empty. Typically, users see a 5% difference on an average
    application.)

    Where dual channel would really pay off on an Nforce2, is
    precisely where most people would not run it. Say you had
    two PC2100 DIMMs and an FSB400 processor. If the DIMMs
    were run single channel, it would be 2.1GB/sec memory
    bandwidth versus 3.2GB/sec processor bandwidth. In dual
    channel mode, the memory offers 4.2GB/sec, and there would
    be a significant difference between those two configs.
    But, only an idiot would buy PC2100 DIMMs today, so the
    Nforce2 doesn't have an opportunity to shine.

    The second case I'll bring up, is the one that prompted
    your question. That is, what is the situation on
    Opteron/Athlon64 ? First of all, a dead giveaway, is if
    you download documents from AMD, they don't use the
    term "dual channel" in their technical descriptions.
    Instead they use "64 bit mode" and "128 bit mode" when
    describing memory. That should tell you right away,
    that something is up.

    One thing to notice about A64, is DIMMs cannot live in
    A1 and A2 by themselves. First you have to populate a
    B1 or B2 slot, before an A slot can be used. That violates
    (1) above.

    I suspect what is going on there, is the processor internal
    organization is no longer the 64 bit width we are accustomed
    to, on the P4 and AthlonXP. If the processor was 128 bits
    width (or the bandwidth equivalent thereof), then there
    is no need to interleave accesses to the DIMMs. When a
    DIMM is present in A1 and B1, the same command is sent to
    both, and it is as if the DIMM is 128 bits wide. Yes, the
    data organization of the DIMMs still looks like the
    interleaving on a dual channel situation, but in this
    case, the processor can "eat" data simultaneously from
    both DIMMs. Since there is no longer a "bottleneck" at
    the processor interface, there is no notion of interleaving
    (the interleaving doesn't have a physical significance, in
    this case it is all internal and hidden from us).

    Well, how does 64 bit mode work then ? One way to do it,
    would be to make two reads to a DIMM, glue the two 8 byte
    quantities together, to make a 128 bit wide word for the
    processor (internal) interface.

    Does A64 benefit from 128 bit mode ? Absolutely. Those
    Sandra memory benchmarks don't lie :)

    I don't know if any AMD documents dwell on details like this,
    and this is the best I can do to sketch how it _might_ work.
    The AMD processor actually contains a memory controller,
    a crossbar, and multiple HyperTransport links. Depending on
    whether the processor is Athlon64, or one of the several
    types of Opteron, determines how many of the HT links is
    connected to pins on the processor. The crossbar is a
    routing device that decides which interface will satisfy
    a request for data. I have no idea what the internal
    organization of the crossbar is - in Opteron, there are a
    lot of cache protocol issues going on in there, and architects
    don't tend to dwell on the tiny issues, like bus widths, in
    a discussion like that. I would say, the maximum width
    of a bus inside A64, would be the size of a cache line
    (a safe bet) - whatever that is.

    If you want to "wallow in the architecture", try this site.
    I don't want to rewrite any of the fine material the
    author has provided here. (Note - if you are on a dialup
    modem, this will take a while to load.)

    http://www.chip-architect.com/news/2003_09_21_Detailed_Architecture_of_AMDs_64bit_Core.html

    To sum up - Athlon64/Opteron is not dual channel. Two
    DIMMs operating in 128 bit mode, simply matches the
    internal organization of the processor better.

    Paul
     
    Paul, Jun 1, 2005
    #3
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.