Motherboard Forums


Reply
Thread Tools Display Modes

Speed problems with ARM7, more detailed post..

 
 
webwraith067
Guest
Posts: n/a
 
      02-04-2004, 07:47 AM
I have built a fully functional ARM7 prototype board based on the
Atmel
AT91R40008 processor. Everything works fine, but the performance of
the
processor is approximately 1/10th what it should be. In a simple in
SRAM
memory write test, I first copy my code to SRAM, and then run out of
SRAM
and write blocks of 32 bytes to consequetive locations in an unrolled
loop
for a total of 9600 bytes (a simple test buffer) then do this loop 8
times,
so the scope can get a good lock. The original C/C++ code and the
dissasembled ARM code are below for reference. The key element is that
other
than the looping overhead the instruction stream should be nothing
other
than fetch, decode, execute of store byte immediate to internal SRAM
of the
form:

STRB Rn,[ip,#dd]

At worst case this should take 1-3 cycles per operation, I am scoping
this
and getting a memory write every 40 -"FORTY" cycles approximately!!!!
This
is bizzare. Of course the External bus interface settings are
irrelevant for
the internal bus, and I am not pulling on the external nWait pin. I
hypothesize that the processor is in some mode after reset and running
slower?
Maybe has something to do with the debug interface, I am not sure,
nothing I
have found in all 3000+ pages of ARM docs lead me to any
conclusions...

As another brief example, this is the C/C++ code for a max speed I/O
toggle, I basically have a scope on one of the I/O pins and I am
toggling in a loop at max speed and then looking at the waveform:


******** C/C++ code

while(1)
{
pio_base_ptr[PIO_SODR/4] = 0x00020000;
pio_base_ptr[PIO_CODR/4] = 0x00020000;
}

And here's the dissassembled ARM code, 5 instructions, yet it it
taking nearly 400 clocks to run these 5 instructions! Again, running
out of SRAM and that's it, bizzare ???

************* ARM CODE

|L000630.J10.C_Entry|
LDR a2,[v2,#4]
STR a1,[a2,#&30]!
LDR a2,[v2,#4]
STR a1,[a2,#&34]!
B |L000630.J10.C_Entry|

There are very few resources with HARDCORE info, any insight would be
greatly appreciated

Desperately seeking a GURU,


Xander.
(E-Mail Removed)


*********** C/C++ version of the memory fill

// fill memory up with incremental values

for (t=0; t < 8; t++)
for (ram_index = 0; ram_index < 9600/1-32; ram_index+=32)
{
work_ptr[ram_index+0] = 1;
work_ptr[ram_index+1] = 2;
work_ptr[ram_index+2] = 3;
work_ptr[ram_index+3] = 4;
work_ptr[ram_index+4] = 1;
work_ptr[ram_index+5] = 2;
work_ptr[ram_index+6] = 3;
work_ptr[ram_index+7] = 4;
work_ptr[ram_index+8] = 1;
work_ptr[ram_index+9] = 2;
work_ptr[ram_index+10] = 3;
work_ptr[ram_index+11] = 4;
work_ptr[ram_index+12] = 1;
work_ptr[ram_index+13] = 2;
work_ptr[ram_index+14] = 3;
work_ptr[ram_index+15] = 4;
work_ptr[ram_index+16] = 1;
work_ptr[ram_index+17] = 2;
work_ptr[ram_index+18] = 3;
work_ptr[ram_index+19] = 4;
work_ptr[ram_index+20] = 1;
work_ptr[ram_index+21] = 2;
work_ptr[ram_index+22] = 3;
work_ptr[ram_index+23] = 4;
work_ptr[ram_index+24] = 1;
work_ptr[ram_index+25] = 2;
work_ptr[ram_index+26] = 3;
work_ptr[ram_index+27] = 4;
work_ptr[ram_index+28] = 1;
work_ptr[ram_index+29] = 2;
work_ptr[ram_index+30] = 3;
work_ptr[ram_index+31] = 4;
}


********* ARM ASM version of the memory fill


|L000638.J8.C_Entry|
STR v2,[v4,#&c5c]
MOV a2,#0
STR v2,[v4,#&c60]
|L000644.J10.C_Entry|
MOV a1,#0
|L000648.J11.C_Entry|
STRB v2,[v1,a1]
ADD ip,v1,a1
STRB a4,[ip,#1]
STRB v3,[ip,#2]
STRB lr,[ip,#3]
STRB v2,[ip,#4]
STRB a4,[ip,#5]
STRB v3,[ip,#6]
STRB lr,[ip,#7]
STRB v2,[ip,#8]
STRB a4,[ip,#9]
STRB v3,[ip,#&a]
STRB lr,[ip,#&b]
STRB v2,[ip,#&c]
STRB a4,[ip,#&d]
STRB v3,[ip,#&e]
STRB lr,[ip,#&f]
STRB v2,[ip,#&10]
STRB a4,[ip,#&11]
STRB v3,[ip,#&12]
STRB lr,[ip,#&13]
STRB v2,[ip,#&14]
STRB a4,[ip,#&15]
STRB v3,[ip,#&16]
STRB lr,[ip,#&17]
STRB v2,[ip,#&18]
STRB a4,[ip,#&19]
STRB v3,[ip,#&1a]
STRB lr,[ip,#&1b]
STRB v2,[ip,#&1c]
STRB a4,[ip,#&1d]
STRB v3,[ip,#&1e]
STRB lr,[ip,#&1f]
ADD a1,a1,#&20
CMP a1,a3
BLT |L000648.J11.C_Entry|
ADD a2,a2,#1
CMP a2,#8
BLT |L000644.J10.C_Entry|
B |L000638.J8.C_Entry|
 
Reply With Quote
 
 
 
 
42Bastian Schick
Guest
Posts: n/a
 
      02-04-2004, 02:11 PM
Short question: You did program the PLL ?
Many MCU don't run at full speed after reset.
---
42Bastian
Do not email to (E-Mail Removed), it's a spam-only account :-)
Use <same-name>@epost.de instead !
 
Reply With Quote
 
 
 
 
Mark Borgerson
Guest
Posts: n/a
 
      02-04-2004, 04:52 PM
In article <(E-Mail Removed) >,
(E-Mail Removed) says...
> I have built a fully functional ARM7 prototype board based on the
> Atmel
> AT91R40008 processor. Everything works fine, but the performance of
> the
> processor is approximately 1/10th what it should be. In a simple in
> SRAM
> memory write test, I first copy my code to SRAM, and then run out of
> SRAM
> and write blocks of 32 bytes to consequetive locations in an unrolled
> loop
> for a total of 9600 bytes (a simple test buffer) then do this loop 8
> times,
> so the scope can get a good lock. The original C/C++ code and the
> dissasembled ARM code are below for reference. The key element is that
> other
> than the looping overhead the instruction stream should be nothing
> other
> than fetch, decode, execute of store byte immediate to internal SRAM
> of the
> form:
>

<<SNIP>>

Hmm. The Atmel docs do say that byte and word access to the internal
RAM is a single-cycle operation. However, they also talk about
a mode that allows you to use the internal RAM to test apps that
will go into flash. I wonder if that means that the processor,
when set up that way also emulates the wait state settings for
the external bus.


Another question is: if you are running the code in internal
RAM and are reading and storing bytes in internal RAM,
what external signals are you monitoring with the scope?


Mark Borgerson


 
Reply With Quote
 
Tauno Voipio
Guest
Posts: n/a
 
      02-04-2004, 06:07 PM
webwraith067 wrote:
> I have built a fully functional ARM7 prototype board based on the
> Atmel
> AT91R40008 processor. Everything works fine, but the performance of
> the
> processor is approximately 1/10th what it should be. In a simple in
> SRAM
> memory write test, I first copy my code to SRAM, and then run out of
> SRAM
> and write blocks of 32 bytes to consequetive locations in an unrolled
> loop
> for a total of 9600 bytes (a simple test buffer) then do this loop 8
> times,
> so the scope can get a good lock.


If you're accessing the internal RAM, you won't get external bus
cycles of the accesses - your scoping results may be not valid.

Also, on a 32 bit RISC core, you should test aligned 32-bit
memory accesses, not bytes. Use a stmia instead of a strb.

HTH

Tauno Voipio
tauno voipio @ iki fi

PS.

I'd start the speed test by building a simple I/O bit on/off
loop, measure its overhead and then add the instructions to
be tested between the on and off writes.

I have not noticed the advertised slowness with an AT91R40008,
and I have several projects built with AT91's. You may be
measuring a wrong thing.

TV

 
Reply With Quote
 
Ulf Samuelsson
Guest
Posts: n/a
 
      02-04-2004, 06:15 PM


"42Bastian Schick" <(E-Mail Removed)> skrev i meddelandet
news:(E-Mail Removed)...
> Short question: You did program the PLL ?
> Many MCU don't run at full speed after reset.
> ---
> 42Bastian
> Do not email to (E-Mail Removed), it's a spam-only account :-)
> Use <same-name>@epost.de instead !


The AT91R40008 does not have a PLL nor internal oscillator.
You feed the Crystal Oscillator signal directly to the chip.

Check wait state programming.
How is the remap function handled?
SRAM should be moved to address zero by the remap function.
Check that you do not by mistake program an EBI register

In short:
Initialize the chip EXACTLY as it is done on the EB40A.
DON'T fool around with anything "clever" until the remap has completed.



--
Best Regards,
Ulf Samuelsson (E-Mail Removed)
This is a personal view which may or may not be
share by my Employer Atmel Nordic AB


 
Reply With Quote
 
webwraith067
Guest
Posts: n/a
 
      02-04-2004, 08:02 PM
(E-Mail Removed) (42Bastian Schick) wrote in message news:<(E-Mail Removed)>...
> Short question: You did program the PLL ?
> Many MCU don't run at full speed after reset.
> ---
> 42Bastian
> Do not email to (E-Mail Removed), it's a spam-only account :-)
> Use <same-name>@epost.de instead !


The AT91R40008 does not have a programable PLL, as far as I can tell
the only way to slow or stretch the clock out is to pull down nWait or
to put the system into debug mode, I am doing neither.... Here's the
actual chip for reference:

http://www.atmel.com/dyn/products/pr...p?part_id=1981

Xander
 
Reply With Quote
 
Sprow
Guest
Posts: n/a
 
      02-04-2004, 11:22 PM
Tauno Voipio <(E-Mail Removed)> wrote in message news:<ZxaUb.483$(E-Mail Removed)>...
> webwraith067 wrote:
> > I have built a fully functional ARM7 prototype board
> > [...] but the performance of the
> > processor is approximately 1/10th what it should be.

>
> If you're accessing the internal RAM, you won't get external bus
> cycles of the accesses - your scoping results may be not valid.


And conversely if it involves external SRAM the ARM core speed is
largely irrelevant once you run it faster than 1/SRAM_access_time.
So for 70ns SRAM you needn't bother trying to exceed 14MHz.
For 70ns 16 bit SRAM that drops to 7MHz for STR or STMIA.

EBI setup is probably the one to watch.

> Also, on a 32 bit RISC core, you should test aligned 32-bit
> memory accesses, not bytes. Use a stmia instead of a strb.


As long as the byte lane strobes are wired up word, half word, and
byte accesses take the same time to word wide memory, indeed with
narrower memory configurations STRB would be 'faster' since you're not
having to slice the oversized read/store up into multiple accesses,
Sprow.
 
Reply With Quote
 
Mark Borgerson
Guest
Posts: n/a
 
      02-05-2004, 03:36 AM
In article <(E-Mail Removed) >,
(E-Mail Removed) says...
> Tauno Voipio <(E-Mail Removed)> wrote in message news:<ZxaUb.483$(E-Mail Removed)>...
> > webwraith067 wrote:
> > > I have built a fully functional ARM7 prototype board
> > > [...] but the performance of the
> > > processor is approximately 1/10th what it should be.

> >
> > If you're accessing the internal RAM, you won't get external bus
> > cycles of the accesses - your scoping results may be not valid.

>
> And conversely if it involves external SRAM the ARM core speed is
> largely irrelevant once you run it faster than 1/SRAM_access_time.
> So for 70ns SRAM you needn't bother trying to exceed 14MHz.
> For 70ns 16 bit SRAM that drops to 7MHz for STR or STMIA.
>
> EBI setup is probably the one to watch.
>
> > Also, on a 32 bit RISC core, you should test aligned 32-bit
> > memory accesses, not bytes. Use a stmia instead of a strb.

>
> As long as the byte lane strobes are wired up word, half word, and
> byte accesses take the same time to word wide memory, indeed with
> narrower memory configurations STRB would be 'faster' since you're not
> having to slice the oversized read/store up into multiple accesses,
> Sprow.
>


Out of curiosity, how does the ARM handle the transfer of a byte
to an odd address in 16-bit memory? Does it shift the byte to
bit positions 8..15, then do the equivalent of loading the
full 16-bit word from memory, moving in the high byte, then storing
the resulting 16-bit word back to memory? Or is there some other
mechanism? The method described would take two memory access
cycles---which could be one clock each, I suppose.

Mark Borgerson


 
Reply With Quote
 
Grant Edwards
Guest
Posts: n/a
 
      02-05-2004, 04:06 AM
In article <(E-Mail Removed) >, Sprow wrote:
> Tauno Voipio <(E-Mail Removed)> wrote in message news:<ZxaUb.483$(E-Mail Removed)>...
>> webwraith067 wrote:
>> > I have built a fully functional ARM7 prototype board
>> > [...] but the performance of the
>> > processor is approximately 1/10th what it should be.

>>
>> If you're accessing the internal RAM, you won't get external bus
>> cycles of the accesses - your scoping results may be not valid.

>
> And conversely if it involves external SRAM the ARM core speed is
> largely irrelevant once you run it faster than 1/SRAM_access_time.


Unless you've got a cache.

--
Grant Edwards grante Yow! Yow! Am I having
at fun yet?
visi.com
 
Reply With Quote
 
Grant Edwards
Guest
Posts: n/a
 
      02-05-2004, 04:12 AM
In article <(E-Mail Removed)> , Mark Borgerson wrote:

> Out of curiosity, how does the ARM handle the transfer of a byte
> to an odd address in 16-bit memory?


Technically, the ARM doesn't handle it at all.

The bus interface does. That is outside the ARM core and varies
from one vendor to another. The most common method is to put
the value on bits 8-15 of the data bus and only assert the write
line for the "high" byte. If I were a betting man, I'd wager
that the value shows up on bits 0-7 of the data bus also, and
the only difference between a byte-write to an even address and
a byte write to an odd address is which of the two byte-write
lines goes active.

> Does it shift the byte to bit positions 8..15, then do the
> equivalent of loading the full 16-bit word from memory, moving
> in the high byte, then storing the resulting 16-bit word back
> to memory?


IMO, nobody in their right mind would do it that way.

> Or is there some other mechanism?


Read the manual for the part in question. It will say exactly
how it's done.

--
Grant Edwards grante Yow! Is something VIOLENT
at going to happen to a
visi.com GARBAGE CAN?
 
Reply With Quote
 
 
 
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Detailed Dell info Howard Nelson Dell 2 06-29-2005 07:03 PM
Need sites, detailed explanation of Dual Channel signmeuptoo Asus 2 06-01-2005 05:24 AM
Detailed bios changelog needed Visual Gigabyte 3 11-04-2004 04:20 AM
More detailed description of SATA problem Anthony J. Bertorelli Soyo 5 08-20-2004 01:51 AM
motherboard detailed specs Sudhakar Govindavajhala MSI 0 12-02-2003 01:41 AM


All times are GMT. The time now is 03:51 AM.


Welcome!
Welcome to Motherboard Point
 

Advertisment