Motherboard Forums


Reply
Thread Tools Display Modes

Re: Why does this memory intensive C++ program get poor memory accessspeed?

 
 
Ian Collins
Guest
Posts: n/a
 
      03-27-2010, 10:50 PM
On 03/28/10 07:09 AM, Peter Olcott wrote:
> MemTest86 and it showed:
> Intel Core-i5 750 2.67 Ghz (quad core)
> 32K L1 88,893 MB/Sec
> 256K L2 37,560 MB/Sec
> 8 MB L3 26,145 MB/Sec
> 8.0 GB RAM 11,852 MB/Sec
>
> The resulting memory access speed is substantially slower
> than worst case cache hit ratio should provide.


From a C++ perspective, bad style? Seriously, this isn't a C++ question.

<snip>

> double Process(uint32 size, uint32 RandomSeed = 0) {
> std::vector<uint32> Data;
> double MBperSec;
> double duration;
> clock_t finish;
> Data.resize(size);
> Initialize(Data, size);
> clock_t start = clock();
> uint32 num = 0;
> for (uint32 N = 0; N< Max; N++)
> num = Data[num];


With one exception: I'd expect most optimisers to reduce this loop to a
op-op.

--
Ian Collins
 
Reply With Quote
 
 
 
 
Liviu
Guest
Posts: n/a
 
      03-28-2010, 10:51 PM
"Ian Collins" <(E-Mail Removed)> wrote...
> On 03/28/10 07:09 AM, Peter Olcott wrote:
>> MemTest86 and it showed:
>> Intel Core-i5 750 2.67 Ghz (quad core)
>> 32K L1 88,893 MB/Sec
>> 256K L2 37,560 MB/Sec
>> 8 MB L3 26,145 MB/Sec
>> 8.0 GB RAM 11,852 MB/Sec
>>
>> The resulting memory access speed is substantially slower
>> than worst case cache hit ratio should provide.


What makes you think that's a "worst case" number? I don't
see any statement to that effect by memTest86 and, absent
such, I'd consider it more of a "best case", or maybe "common
usage pattern" speed, which your test is neither.

> From a C++ perspective, bad style? Seriously, this isn't
> a C++ question.


Right, and sorry, can't help but keep it off-topic ;-) just a few
comments below, mostly specific to 32b windows and vc++ v9.

>> double Process(uint32 size, uint32 RandomSeed = 0) {
>> std::vector<uint32> Data;
>> double MBperSec;
>> double duration;
>> clock_t finish;
>> Data.resize(size);
>> Initialize(Data, size);
>> clock_t start = clock();
>> uint32 num = 0;
>> for (uint32 N = 0; N< Max; N++)
>> num = Data[num];

>
> With one exception: I'd expect most optimisers to reduce this loop
> to a op-op.


Reading that as a "no-op", and you are right, of course... Just a guess,
however, since there was no command line or makefile given, but unless
it had a "#define _SECURE_SCL 0" or equivalent somewhere else,
the default compile would have used bounds checking for std::vector,
which referenced 'num' and saved the loop from being optimized away.

Also, the generated loop had an extra memory access since the compiler
decided to save 'num' on the stack between iterations. Anyway, running
similar mockup code (with the array holding pointers, rather than
offsets, and 'num = Data[num];' replaced by 'pdata = (uint32 *)*pdata;')
the resulting assembler code had just the intended single read of memory
in a tight loop.With random addresses, as posted, it registered around
150MB/sec on my test machine. With sequential reads, instead, (i.e.
replacing Initialize with 'Data[N] = (uint32)&Data[(N + 1) % size];')
it went up to 2.5GB/sec, more than 15 times faster.

My numbers above were on a 5+ year old machine with lesser
specs than the OP's. For comparison, the PC Wizard memory
benchmark listed the "memory bandwidth" at around 4.2GB/sec.
Their page at http://www.cpuid.com/pcwizard.php explicitly says...

|| MEMORY and CACHE: These benchmarks measure
|| the maximum achiveable memory bandwidth.

....so I did not find the test numbers surprising. After all, hopping
at random around memory could hardly ever be expected to achieve
anything near the "maximum bandwidth". It's just another example
of why "locality of reference" matters with real life caching schemes.

Liviu


 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Why this intensive Winlogon activity? Terry Pinnell Nvidia 14 03-23-2007 04:37 PM
Why this intensive Winlogon activity? Terry Pinnell Asus 2 03-23-2007 04:30 PM
Best card for CPU intensive gaming? John Doe ATI 9 10-08-2006 06:04 AM
Devices disappearing when using video intensive applications with KT7-R system Jason Abit 8 11-03-2003 10:49 PM
NFS-7 freezes when playing graphics intensive games Austin Abit 3 08-02-2003 03:19 PM


All times are GMT. The time now is 05:14 AM.


Welcome!
Welcome to Motherboard Point
 

Advertisment