1. This forum section is a read-only archive which contains old newsgroup posts. If you wish to post a query, please do so in one of our main forum sections (here). This way you will get a faster, better response from the members on Motherboard Point.

Re: Why does this memory intensive C++ program get poor memory accessspeed?

Discussion in 'Hardware' started by Ian Collins, Mar 27, 2010.

  1. Ian Collins

    Ian Collins Guest

    On 03/28/10 07:09 AM, Peter Olcott wrote:
    > MemTest86 and it showed:
    > Intel Core-i5 750 2.67 Ghz (quad core)
    > 32K L1 88,893 MB/Sec
    > 256K L2 37,560 MB/Sec
    > 8 MB L3 26,145 MB/Sec
    > 8.0 GB RAM 11,852 MB/Sec
    >
    > The resulting memory access speed is substantially slower
    > than worst case cache hit ratio should provide.


    From a C++ perspective, bad style? Seriously, this isn't a C++ question.

    <snip>

    > double Process(uint32 size, uint32 RandomSeed = 0) {
    > std::vector<uint32> Data;
    > double MBperSec;
    > double duration;
    > clock_t finish;
    > Data.resize(size);
    > Initialize(Data, size);
    > clock_t start = clock();
    > uint32 num = 0;
    > for (uint32 N = 0; N< Max; N++)
    > num = Data[num];


    With one exception: I'd expect most optimisers to reduce this loop to a
    op-op.

    --
    Ian Collins
     
    Ian Collins, Mar 27, 2010
    #1
    1. Advertising

  2. Ian Collins

    Liviu Guest

    Re: Why does this memory intensive C++ program get poor memory access speed?

    "Ian Collins" <> wrote...
    > On 03/28/10 07:09 AM, Peter Olcott wrote:
    >> MemTest86 and it showed:
    >> Intel Core-i5 750 2.67 Ghz (quad core)
    >> 32K L1 88,893 MB/Sec
    >> 256K L2 37,560 MB/Sec
    >> 8 MB L3 26,145 MB/Sec
    >> 8.0 GB RAM 11,852 MB/Sec
    >>
    >> The resulting memory access speed is substantially slower
    >> than worst case cache hit ratio should provide.


    What makes you think that's a "worst case" number? I don't
    see any statement to that effect by memTest86 and, absent
    such, I'd consider it more of a "best case", or maybe "common
    usage pattern" speed, which your test is neither.

    > From a C++ perspective, bad style? Seriously, this isn't
    > a C++ question.


    Right, and sorry, can't help but keep it off-topic ;-) just a few
    comments below, mostly specific to 32b windows and vc++ v9.

    >> double Process(uint32 size, uint32 RandomSeed = 0) {
    >> std::vector<uint32> Data;
    >> double MBperSec;
    >> double duration;
    >> clock_t finish;
    >> Data.resize(size);
    >> Initialize(Data, size);
    >> clock_t start = clock();
    >> uint32 num = 0;
    >> for (uint32 N = 0; N< Max; N++)
    >> num = Data[num];

    >
    > With one exception: I'd expect most optimisers to reduce this loop
    > to a op-op.


    Reading that as a "no-op", and you are right, of course... Just a guess,
    however, since there was no command line or makefile given, but unless
    it had a "#define _SECURE_SCL 0" or equivalent somewhere else,
    the default compile would have used bounds checking for std::vector,
    which referenced 'num' and saved the loop from being optimized away.

    Also, the generated loop had an extra memory access since the compiler
    decided to save 'num' on the stack between iterations. Anyway, running
    similar mockup code (with the array holding pointers, rather than
    offsets, and 'num = Data[num];' replaced by 'pdata = (uint32 *)*pdata;')
    the resulting assembler code had just the intended single read of memory
    in a tight loop.With random addresses, as posted, it registered around
    150MB/sec on my test machine. With sequential reads, instead, (i.e.
    replacing Initialize with 'Data[N] = (uint32)&Data[(N + 1) % size];')
    it went up to 2.5GB/sec, more than 15 times faster.

    My numbers above were on a 5+ year old machine with lesser
    specs than the OP's. For comparison, the PC Wizard memory
    benchmark listed the "memory bandwidth" at around 4.2GB/sec.
    Their page at http://www.cpuid.com/pcwizard.php explicitly says...

    || MEMORY and CACHE: These benchmarks measure
    || the maximum achiveable memory bandwidth.

    ....so I did not find the test numbers surprising. After all, hopping
    at random around memory could hardly ever be expected to achieve
    anything near the "maximum bandwidth". It's just another example
    of why "locality of reference" matters with real life caching schemes.

    Liviu
     
    Liviu, Mar 28, 2010
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Austin
    Replies:
    3
    Views:
    558
    Austin
    Aug 2, 2003
  2. Jason
    Replies:
    8
    Views:
    262
  3. John Doe
    Replies:
    9
    Views:
    254
    Ken Maltby
    Oct 8, 2006
  4. Terry Pinnell

    Why this intensive Winlogon activity?

    Terry Pinnell, Mar 22, 2007, in forum: Nvidia
    Replies:
    14
    Views:
    498
    Terry Pinnell
    Mar 23, 2007
  5. Terry Pinnell

    Why this intensive Winlogon activity?

    Terry Pinnell, Mar 22, 2007, in forum: Asus
    Replies:
    2
    Views:
    290
    Terry Pinnell
    Mar 23, 2007
Loading...

Share This Page