1. This forum section is a read-only archive which contains old newsgroup posts. If you wish to post a query, please do so in one of our main forum sections (here). This way you will get a faster, better response from the members on Motherboard Point.

Speed problems with ARM7, more detailed post..

Discussion in 'Embedded' started by webwraith067, Feb 4, 2004.

  1. webwraith067

    webwraith067 Guest

    I have built a fully functional ARM7 prototype board based on the
    Atmel
    AT91R40008 processor. Everything works fine, but the performance of
    the
    processor is approximately 1/10th what it should be. In a simple in
    SRAM
    memory write test, I first copy my code to SRAM, and then run out of
    SRAM
    and write blocks of 32 bytes to consequetive locations in an unrolled
    loop
    for a total of 9600 bytes (a simple test buffer) then do this loop 8
    times,
    so the scope can get a good lock. The original C/C++ code and the
    dissasembled ARM code are below for reference. The key element is that
    other
    than the looping overhead the instruction stream should be nothing
    other
    than fetch, decode, execute of store byte immediate to internal SRAM
    of the
    form:

    STRB Rn,[ip,#dd]

    At worst case this should take 1-3 cycles per operation, I am scoping
    this
    and getting a memory write every 40 -"FORTY" cycles approximately!!!!
    This
    is bizzare. Of course the External bus interface settings are
    irrelevant for
    the internal bus, and I am not pulling on the external nWait pin. I
    hypothesize that the processor is in some mode after reset and running
    slower?
    Maybe has something to do with the debug interface, I am not sure,
    nothing I
    have found in all 3000+ pages of ARM docs lead me to any
    conclusions...

    As another brief example, this is the C/C++ code for a max speed I/O
    toggle, I basically have a scope on one of the I/O pins and I am
    toggling in a loop at max speed and then looking at the waveform:


    ******** C/C++ code

    while(1)
    {
    pio_base_ptr[PIO_SODR/4] = 0x00020000;
    pio_base_ptr[PIO_CODR/4] = 0x00020000;
    }

    And here's the dissassembled ARM code, 5 instructions, yet it it
    taking nearly 400 clocks to run these 5 instructions! Again, running
    out of SRAM and that's it, bizzare ???

    ************* ARM CODE

    |L000630.J10.C_Entry|
    LDR a2,[v2,#4]
    STR a1,[a2,#&30]!
    LDR a2,[v2,#4]
    STR a1,[a2,#&34]!
    B |L000630.J10.C_Entry|

    There are very few resources with HARDCORE info, any insight would be
    greatly appreciated :)

    Desperately seeking a GURU,


    Xander.



    *********** C/C++ version of the memory fill

    // fill memory up with incremental values

    for (t=0; t < 8; t++)
    for (ram_index = 0; ram_index < 9600/1-32; ram_index+=32)
    {
    work_ptr[ram_index+0] = 1;
    work_ptr[ram_index+1] = 2;
    work_ptr[ram_index+2] = 3;
    work_ptr[ram_index+3] = 4;
    work_ptr[ram_index+4] = 1;
    work_ptr[ram_index+5] = 2;
    work_ptr[ram_index+6] = 3;
    work_ptr[ram_index+7] = 4;
    work_ptr[ram_index+8] = 1;
    work_ptr[ram_index+9] = 2;
    work_ptr[ram_index+10] = 3;
    work_ptr[ram_index+11] = 4;
    work_ptr[ram_index+12] = 1;
    work_ptr[ram_index+13] = 2;
    work_ptr[ram_index+14] = 3;
    work_ptr[ram_index+15] = 4;
    work_ptr[ram_index+16] = 1;
    work_ptr[ram_index+17] = 2;
    work_ptr[ram_index+18] = 3;
    work_ptr[ram_index+19] = 4;
    work_ptr[ram_index+20] = 1;
    work_ptr[ram_index+21] = 2;
    work_ptr[ram_index+22] = 3;
    work_ptr[ram_index+23] = 4;
    work_ptr[ram_index+24] = 1;
    work_ptr[ram_index+25] = 2;
    work_ptr[ram_index+26] = 3;
    work_ptr[ram_index+27] = 4;
    work_ptr[ram_index+28] = 1;
    work_ptr[ram_index+29] = 2;
    work_ptr[ram_index+30] = 3;
    work_ptr[ram_index+31] = 4;
    }


    ********* ARM ASM version of the memory fill


    |L000638.J8.C_Entry|
    STR v2,[v4,#&c5c]
    MOV a2,#0
    STR v2,[v4,#&c60]
    |L000644.J10.C_Entry|
    MOV a1,#0
    |L000648.J11.C_Entry|
    STRB v2,[v1,a1]
    ADD ip,v1,a1
    STRB a4,[ip,#1]
    STRB v3,[ip,#2]
    STRB lr,[ip,#3]
    STRB v2,[ip,#4]
    STRB a4,[ip,#5]
    STRB v3,[ip,#6]
    STRB lr,[ip,#7]
    STRB v2,[ip,#8]
    STRB a4,[ip,#9]
    STRB v3,[ip,#&a]
    STRB lr,[ip,#&b]
    STRB v2,[ip,#&c]
    STRB a4,[ip,#&d]
    STRB v3,[ip,#&e]
    STRB lr,[ip,#&f]
    STRB v2,[ip,#&10]
    STRB a4,[ip,#&11]
    STRB v3,[ip,#&12]
    STRB lr,[ip,#&13]
    STRB v2,[ip,#&14]
    STRB a4,[ip,#&15]
    STRB v3,[ip,#&16]
    STRB lr,[ip,#&17]
    STRB v2,[ip,#&18]
    STRB a4,[ip,#&19]
    STRB v3,[ip,#&1a]
    STRB lr,[ip,#&1b]
    STRB v2,[ip,#&1c]
    STRB a4,[ip,#&1d]
    STRB v3,[ip,#&1e]
    STRB lr,[ip,#&1f]
    ADD a1,a1,#&20
    CMP a1,a3
    BLT |L000648.J11.C_Entry|
    ADD a2,a2,#1
    CMP a2,#8
    BLT |L000644.J10.C_Entry|
    B |L000638.J8.C_Entry|
    webwraith067, Feb 4, 2004
    #1
    1. Advertising

  2. Short question: You did program the PLL ?
    Many MCU don't run at full speed after reset.
    ---
    42Bastian
    Do not email to , it's a spam-only account :)
    Use <same-name>@epost.de instead !
    42Bastian Schick, Feb 4, 2004
    #2
    1. Advertising

  3. In article <>,
    says...
    > I have built a fully functional ARM7 prototype board based on the
    > Atmel
    > AT91R40008 processor. Everything works fine, but the performance of
    > the
    > processor is approximately 1/10th what it should be. In a simple in
    > SRAM
    > memory write test, I first copy my code to SRAM, and then run out of
    > SRAM
    > and write blocks of 32 bytes to consequetive locations in an unrolled
    > loop
    > for a total of 9600 bytes (a simple test buffer) then do this loop 8
    > times,
    > so the scope can get a good lock. The original C/C++ code and the
    > dissasembled ARM code are below for reference. The key element is that
    > other
    > than the looping overhead the instruction stream should be nothing
    > other
    > than fetch, decode, execute of store byte immediate to internal SRAM
    > of the
    > form:
    >

    <<SNIP>>

    Hmm. The Atmel docs do say that byte and word access to the internal
    RAM is a single-cycle operation. However, they also talk about
    a mode that allows you to use the internal RAM to test apps that
    will go into flash. I wonder if that means that the processor,
    when set up that way also emulates the wait state settings for
    the external bus.


    Another question is: if you are running the code in internal
    RAM and are reading and storing bytes in internal RAM,
    what external signals are you monitoring with the scope?


    Mark Borgerson
    Mark Borgerson, Feb 4, 2004
    #3
  4. webwraith067

    Tauno Voipio Guest

    webwraith067 wrote:
    > I have built a fully functional ARM7 prototype board based on the
    > Atmel
    > AT91R40008 processor. Everything works fine, but the performance of
    > the
    > processor is approximately 1/10th what it should be. In a simple in
    > SRAM
    > memory write test, I first copy my code to SRAM, and then run out of
    > SRAM
    > and write blocks of 32 bytes to consequetive locations in an unrolled
    > loop
    > for a total of 9600 bytes (a simple test buffer) then do this loop 8
    > times,
    > so the scope can get a good lock.


    If you're accessing the internal RAM, you won't get external bus
    cycles of the accesses - your scoping results may be not valid.

    Also, on a 32 bit RISC core, you should test aligned 32-bit
    memory accesses, not bytes. Use a stmia instead of a strb.

    HTH

    Tauno Voipio
    tauno voipio @ iki fi

    PS.

    I'd start the speed test by building a simple I/O bit on/off
    loop, measure its overhead and then add the instructions to
    be tested between the on and off writes.

    I have not noticed the advertised slowness with an AT91R40008,
    and I have several projects built with AT91's. You may be
    measuring a wrong thing.

    TV
    Tauno Voipio, Feb 4, 2004
    #4
  5. "42Bastian Schick" <> skrev i meddelandet
    news:...
    > Short question: You did program the PLL ?
    > Many MCU don't run at full speed after reset.
    > ---
    > 42Bastian
    > Do not email to , it's a spam-only account :)
    > Use <same-name>@epost.de instead !


    The AT91R40008 does not have a PLL nor internal oscillator.
    You feed the Crystal Oscillator signal directly to the chip.

    Check wait state programming.
    How is the remap function handled?
    SRAM should be moved to address zero by the remap function.
    Check that you do not by mistake program an EBI register

    In short:
    Initialize the chip EXACTLY as it is done on the EB40A.
    DON'T fool around with anything "clever" until the remap has completed.



    --
    Best Regards,
    Ulf Samuelsson
    This is a personal view which may or may not be
    share by my Employer Atmel Nordic AB
    Ulf Samuelsson, Feb 4, 2004
    #5
  6. webwraith067

    webwraith067 Guest

    (42Bastian Schick) wrote in message news:<>...
    > Short question: You did program the PLL ?
    > Many MCU don't run at full speed after reset.
    > ---
    > 42Bastian
    > Do not email to , it's a spam-only account :)
    > Use <same-name>@epost.de instead !


    The AT91R40008 does not have a programable PLL, as far as I can tell
    the only way to slow or stretch the clock out is to pull down nWait or
    to put the system into debug mode, I am doing neither.... Here's the
    actual chip for reference:

    http://www.atmel.com/dyn/products/product_card.asp?part_id=1981

    Xander
    webwraith067, Feb 4, 2004
    #6
  7. webwraith067

    Sprow Guest

    Tauno Voipio <> wrote in message news:<ZxaUb.483$>...
    > webwraith067 wrote:
    > > I have built a fully functional ARM7 prototype board
    > > [...] but the performance of the
    > > processor is approximately 1/10th what it should be.

    >
    > If you're accessing the internal RAM, you won't get external bus
    > cycles of the accesses - your scoping results may be not valid.


    And conversely if it involves external SRAM the ARM core speed is
    largely irrelevant once you run it faster than 1/SRAM_access_time.
    So for 70ns SRAM you needn't bother trying to exceed 14MHz.
    For 70ns 16 bit SRAM that drops to 7MHz for STR or STMIA.

    EBI setup is probably the one to watch.

    > Also, on a 32 bit RISC core, you should test aligned 32-bit
    > memory accesses, not bytes. Use a stmia instead of a strb.


    As long as the byte lane strobes are wired up word, half word, and
    byte accesses take the same time to word wide memory, indeed with
    narrower memory configurations STRB would be 'faster' since you're not
    having to slice the oversized read/store up into multiple accesses,
    Sprow.
    Sprow, Feb 4, 2004
    #7
  8. In article <>,
    says...
    > Tauno Voipio <> wrote in message news:<ZxaUb.483$>...
    > > webwraith067 wrote:
    > > > I have built a fully functional ARM7 prototype board
    > > > [...] but the performance of the
    > > > processor is approximately 1/10th what it should be.

    > >
    > > If you're accessing the internal RAM, you won't get external bus
    > > cycles of the accesses - your scoping results may be not valid.

    >
    > And conversely if it involves external SRAM the ARM core speed is
    > largely irrelevant once you run it faster than 1/SRAM_access_time.
    > So for 70ns SRAM you needn't bother trying to exceed 14MHz.
    > For 70ns 16 bit SRAM that drops to 7MHz for STR or STMIA.
    >
    > EBI setup is probably the one to watch.
    >
    > > Also, on a 32 bit RISC core, you should test aligned 32-bit
    > > memory accesses, not bytes. Use a stmia instead of a strb.

    >
    > As long as the byte lane strobes are wired up word, half word, and
    > byte accesses take the same time to word wide memory, indeed with
    > narrower memory configurations STRB would be 'faster' since you're not
    > having to slice the oversized read/store up into multiple accesses,
    > Sprow.
    >


    Out of curiosity, how does the ARM handle the transfer of a byte
    to an odd address in 16-bit memory? Does it shift the byte to
    bit positions 8..15, then do the equivalent of loading the
    full 16-bit word from memory, moving in the high byte, then storing
    the resulting 16-bit word back to memory? Or is there some other
    mechanism? The method described would take two memory access
    cycles---which could be one clock each, I suppose.

    Mark Borgerson
    Mark Borgerson, Feb 5, 2004
    #8
  9. In article <>, Sprow wrote:
    > Tauno Voipio <> wrote in message news:<ZxaUb.483$>...
    >> webwraith067 wrote:
    >> > I have built a fully functional ARM7 prototype board
    >> > [...] but the performance of the
    >> > processor is approximately 1/10th what it should be.

    >>
    >> If you're accessing the internal RAM, you won't get external bus
    >> cycles of the accesses - your scoping results may be not valid.

    >
    > And conversely if it involves external SRAM the ARM core speed is
    > largely irrelevant once you run it faster than 1/SRAM_access_time.


    Unless you've got a cache.

    --
    Grant Edwards grante Yow! Yow! Am I having
    at fun yet?
    visi.com
    Grant Edwards, Feb 5, 2004
    #9
  10. In article <>, Mark Borgerson wrote:

    > Out of curiosity, how does the ARM handle the transfer of a byte
    > to an odd address in 16-bit memory?


    Technically, the ARM doesn't handle it at all.

    The bus interface does. That is outside the ARM core and varies
    from one vendor to another. The most common method is to put
    the value on bits 8-15 of the data bus and only assert the write
    line for the "high" byte. If I were a betting man, I'd wager
    that the value shows up on bits 0-7 of the data bus also, and
    the only difference between a byte-write to an even address and
    a byte write to an odd address is which of the two byte-write
    lines goes active.

    > Does it shift the byte to bit positions 8..15, then do the
    > equivalent of loading the full 16-bit word from memory, moving
    > in the high byte, then storing the resulting 16-bit word back
    > to memory?


    IMO, nobody in their right mind would do it that way.

    > Or is there some other mechanism?


    Read the manual for the part in question. It will say exactly
    how it's done.

    --
    Grant Edwards grante Yow! Is something VIOLENT
    at going to happen to a
    visi.com GARBAGE CAN?
    Grant Edwards, Feb 5, 2004
    #10
  11. webwraith067

    webwraith067 Guest

    "Ulf Samuelsson" <> wrote in message news:<N1bUb.2775$O41.76025@amstwist00>...
    > "42Bastian Schick" <> skrev i meddelandet
    > news:...
    > > Short question: You did program the PLL ?
    > > Many MCU don't run at full speed after reset.
    > > ---
    > > 42Bastian
    > > Do not email to , it's a spam-only account :)
    > > Use <same-name>@epost.de instead !

    >
    > The AT91R40008 does not have a PLL nor internal oscillator.
    > You feed the Crystal Oscillator signal directly to the chip.
    >
    > Check wait state programming.
    > How is the remap function handled?
    > SRAM should be moved to address zero by the remap function.
    > Check that you do not by mistake program an EBI register
    >
    > In short:
    > Initialize the chip EXACTLY as it is done on the EB40A.
    > DON'T fool around with anything "clever" until the remap has completed.


    I have remapped it perfected, the EBI wait states are irrelevant for
    internal access, but of course they are set properly. The remap
    operation, everything is perfect. The code is running out of SRAM at
    0x00000000, its on the chip, simple as that, and running 10-50 times
    slower than it should.

    Xander
    webwraith067, Feb 5, 2004
    #11
  12. webwraith067

    webwraith067 Guest

    Mark Borgerson <> wrote in message news:<>...
    > In article <>,
    > says...
    > > I have built a fully functional ARM7 prototype board based on the
    > > Atmel
    > > AT91R40008 processor. Everything works fine, but the performance of
    > > the
    > > processor is approximately 1/10th what it should be. In a simple in
    > > SRAM
    > > memory write test, I first copy my code to SRAM, and then run out of
    > > SRAM
    > > and write blocks of 32 bytes to consequetive locations in an unrolled
    > > loop
    > > for a total of 9600 bytes (a simple test buffer) then do this loop 8
    > > times,
    > > so the scope can get a good lock. The original C/C++ code and the
    > > dissasembled ARM code are below for reference. The key element is that
    > > other
    > > than the looping overhead the instruction stream should be nothing
    > > other
    > > than fetch, decode, execute of store byte immediate to internal SRAM
    > > of the
    > > form:
    > >

    > <<SNIP>>
    >
    > Hmm. The Atmel docs do say that byte and word access to the internal
    > RAM is a single-cycle operation. However, they also talk about
    > a mode that allows you to use the internal RAM to test apps that
    > will go into flash. I wonder if that means that the processor,
    > when set up that way also emulates the wait state settings for
    > the external bus.
    >
    >
    > Another question is: if you are running the code in internal
    > RAM and are reading and storing bytes in internal RAM,
    > what external signals are you monitoring with the scope?
    >
    >
    > Mark Borgerson


    The EBI bus interface still outputs all the internal activity, the
    address bus, control bus, etc. all still do their thing, only the chip
    select lines nCS0-nCS3 will become active on an external address,
    also, all internal SRAM accesses are 0 wait state. However, we are
    still talking about nearly two orders of magnitude slowdown here. If
    if there were gremlins and the system was talking to a flash or
    external memory (which there is none) at 8-ws plus the 8 data float
    then that would be 16 cycles per instruction more or less, I am
    talking about 250-500 here, its truly bizzare.
    webwraith067, Feb 5, 2004
    #12
  13. webwraith067

    webwraith067 Guest

    (Sprow) wrote in message news:<>...
    > Tauno Voipio <> wrote in message news:<ZxaUb.483$>...
    > > webwraith067 wrote:
    > > > I have built a fully functional ARM7 prototype board
    > > > [...] but the performance of the
    > > > processor is approximately 1/10th what it should be.

    > >
    > > If you're accessing the internal RAM, you won't get external bus
    > > cycles of the accesses - your scoping results may be not valid.

    >
    > And conversely if it involves external SRAM the ARM core speed is
    > largely irrelevant once you run it faster than 1/SRAM_access_time.
    > So for 70ns SRAM you needn't bother trying to exceed 14MHz.
    > For 70ns 16 bit SRAM that drops to 7MHz for STR or STMIA.
    >
    > EBI setup is probably the one to watch.
    >
    > > Also, on a 32 bit RISC core, you should test aligned 32-bit
    > > memory accesses, not bytes. Use a stmia instead of a strb.

    >
    > As long as the byte lane strobes are wired up word, half word, and
    > byte accesses take the same time to word wide memory, indeed with
    > narrower memory configurations STRB would be 'faster' since you're not
    > having to slice the oversized read/store up into multiple accesses,
    > Sprow.


    This is moot though, I am running from the internal SRAM, the loop is
    5 instructions, its running at a performance level of .75 MHz with a
    66 mhz clock.
    webwraith067, Feb 5, 2004
    #13
  14. webwraith067

    webwraith067 Guest

    Mark Borgerson <> wrote in message news:<>...
    > In article <>,
    > says...
    > > Tauno Voipio <> wrote in message news:<ZxaUb.483$>...
    > > > webwraith067 wrote:
    > > > > I have built a fully functional ARM7 prototype board
    > > > > [...] but the performance of the
    > > > > processor is approximately 1/10th what it should be.
    > > >
    > > > If you're accessing the internal RAM, you won't get external bus
    > > > cycles of the accesses - your scoping results may be not valid.

    > >
    > > And conversely if it involves external SRAM the ARM core speed is
    > > largely irrelevant once you run it faster than 1/SRAM_access_time.
    > > So for 70ns SRAM you needn't bother trying to exceed 14MHz.
    > > For 70ns 16 bit SRAM that drops to 7MHz for STR or STMIA.
    > >
    > > EBI setup is probably the one to watch.
    > >
    > > > Also, on a 32 bit RISC core, you should test aligned 32-bit
    > > > memory accesses, not bytes. Use a stmia instead of a strb.

    > >
    > > As long as the byte lane strobes are wired up word, half word, and
    > > byte accesses take the same time to word wide memory, indeed with
    > > narrower memory configurations STRB would be 'faster' since you're not
    > > having to slice the oversized read/store up into multiple accesses,
    > > Sprow.
    > >

    >
    > Out of curiosity, how does the ARM handle the transfer of a byte
    > to an odd address in 16-bit memory? Does it shift the byte to
    > bit positions 8..15, then do the equivalent of loading the
    > full 16-bit word from memory, moving in the high byte, then storing
    > the resulting 16-bit word back to memory? Or is there some other
    > mechanism? The method described would take two memory access
    > cycles---which could be one clock each, I suppose.
    >
    > Mark Borgerson


    The arm shifts the data in the half word, in worst case you loose 1
    cycle, however, the internal SRAM has no restriction, its 1 cycle
    access for byte, word, quad. And of course we are running out of SRAM
    internally -- this is totally on the chip, nothing, but the chip, a 66
    mhz clock, and a scope/LA watching everything, the key is that the I/O
    pin I am toggling with the loop:

    while(1)
    {
    write(1);
    write(0);
    }

    Which assembles to 5 instructions is toggling the I/O at 250-260
    clocks per instruction, and of course this is the same for external
    memory, EVERYTHING, there is something deeper going on than simple
    explanations. The ICE embedded in the ARM has to have something to do
    with this, I have a scary feeling its clocking the entire boundary
    scan each cycle around to the JTAG port, there are 100 pins on the
    ARM, and I am getting 100 times slowdown -- coincidence?
    webwraith067, Feb 5, 2004
    #14
  15. In article <>,
    says...
    > Mark Borgerson <> wrote in message news:<>...
    > > In article <>,
    > > says...
    > > > Tauno Voipio <> wrote in message news:<ZxaUb.483$>...
    > > > > webwraith067 wrote:
    > > > > > I have built a fully functional ARM7 prototype board
    > > > > > [...] but the performance of the
    > > > > > processor is approximately 1/10th what it should be.
    > > > >
    > > > > If you're accessing the internal RAM, you won't get external bus
    > > > > cycles of the accesses - your scoping results may be not valid.
    > > >
    > > > And conversely if it involves external SRAM the ARM core speed is
    > > > largely irrelevant once you run it faster than 1/SRAM_access_time.
    > > > So for 70ns SRAM you needn't bother trying to exceed 14MHz.
    > > > For 70ns 16 bit SRAM that drops to 7MHz for STR or STMIA.
    > > >
    > > > EBI setup is probably the one to watch.
    > > >
    > > > > Also, on a 32 bit RISC core, you should test aligned 32-bit
    > > > > memory accesses, not bytes. Use a stmia instead of a strb.
    > > >
    > > > As long as the byte lane strobes are wired up word, half word, and
    > > > byte accesses take the same time to word wide memory, indeed with
    > > > narrower memory configurations STRB would be 'faster' since you're not
    > > > having to slice the oversized read/store up into multiple accesses,
    > > > Sprow.
    > > >

    > >
    > > Out of curiosity, how does the ARM handle the transfer of a byte
    > > to an odd address in 16-bit memory? Does it shift the byte to
    > > bit positions 8..15, then do the equivalent of loading the
    > > full 16-bit word from memory, moving in the high byte, then storing
    > > the resulting 16-bit word back to memory? Or is there some other
    > > mechanism? The method described would take two memory access
    > > cycles---which could be one clock each, I suppose.
    > >
    > > Mark Borgerson

    >
    > The arm shifts the data in the half word, in worst case you loose 1
    > cycle, however, the internal SRAM has no restriction, its 1 cycle
    > access for byte, word, quad. And of course we are running out of SRAM
    > internally -- this is totally on the chip, nothing, but the chip, a 66
    > mhz clock, and a scope/LA watching everything, the key is that the I/O
    > pin I am toggling with the loop:
    >
    > while(1)
    > {
    > write(1);
    > write(0);
    > }
    >
    > Which assembles to 5 instructions is toggling the I/O at 250-260
    > clocks per instruction, and of course this is the same for external
    > memory, EVERYTHING, there is something deeper going on than simple
    > explanations. The ICE embedded in the ARM has to have something to do
    > with this, I have a scary feeling its clocking the entire boundary
    > scan each cycle around to the JTAG port, there are 100 pins on the
    > ARM, and I am getting 100 times slowdown -- coincidence?
    >

    Do you have Reset, TDS, TCK, and TDI pulled up with 10K? Is there any
    activity on the JTAG lines?


    Mark Borgerson
    Mark Borgerson, Feb 5, 2004
    #15
  16. > The arm shifts the data in the half word, in worst case you loose 1
    > cycle, however, the internal SRAM has no restriction, its 1 cycle
    > access for byte, word, quad. And of course we are running out of SRAM
    > internally -- this is totally on the chip, nothing, but the chip, a 66
    > mhz clock, and a scope/LA watching everything, the key is that the I/O
    > pin I am toggling with the loop:
    >
    > while(1)
    > {
    > write(1);
    > write(0);
    > }
    >
    > Which assembles to 5 instructions is toggling the I/O at 250-260
    > clocks per instruction, and of course this is the same for external
    > memory, EVERYTHING, there is something deeper going on than simple
    > explanations. The ICE embedded in the ARM has to have something to do
    > with this, I have a scary feeling its clocking the entire boundary
    > scan each cycle around to the JTAG port, there are 100 pins on the
    > ARM, and I am getting 100 times slowdown -- coincidence?


    Definitely a coincidence, since the AT91R40008 only has a JTAG embedded ICE
    and no JTAG boundary scan around the pin.

    We need more info.

    Maybe try enabling more chipselects.
    1 W/S for chipselect 1
    2 W/S for chipselect 2
    3 W/S for chipselect 3.

    and then do a store to each chipselect in succession.
    Use a scope and trigger on cs1.
    Check the length of each cs to determine if the chip runs at a slower clock
    for an obscure reason.
    Check the distance between the chip selects to see if anything funny occurs
    between instructions.

    --
    Best Regards
    Ulf at atmel dot com
    These comments are intended to be my own opinion and they
    may, or may not be shared by my employer, Atmel Sweden.
    Ulf Samuelsson, Feb 5, 2004
    #16
  17. webwraith067

    Sprow Guest

    (webwraith067) wrote in message news:<>...
    > (Sprow) wrote in message news:<>...
    > > Tauno Voipio <> wrote in message news:<ZxaUb.483$>...
    > > > webwraith067 wrote:
    > > > > I have built a fully functional ARM7 prototype board
    > > > > [...] but the performance of the
    > > > > processor is approximately 1/10th what it should be.
    > > >
    > > > If you're accessing the internal RAM, you won't get external bus
    > > > cycles of the accesses - your scoping results may be not valid.

    > >
    > > And conversely if it involves external SRAM the ARM core speed is
    > > largely irrelevant once you run it faster than 1/SRAM_access_time.

    >
    > Unless you've got a cache


    Cache would be nice, but no cache to be had in this case.
    Things get more fun because if you continually cause cache misses then
    you end up with 1/SRAM_access_time again while cache line fills occur,
    this is well documented in long boring papers elsewhere I'm sure!

    > This is moot though, I am running from the internal SRAM, the loop is
    > 5 instructions, its running at a performance level of .75 MHz with a
    > 66 mhz clock.


    Moot, but worth saying.
    I thought in the original post that you looked at it on a scope?
    Sprow.
    Sprow, Feb 5, 2004
    #17
  18. > > Unless you've got a cache
    >
    > Cache would be nice, but no cache to be had in this case.
    > Things get more fun because if you continually cause cache misses then
    > you end up with 1/SRAM_access_time again while cache line fills occur,
    > this is well documented in long boring papers elsewhere I'm sure!
    >


    The ARM7TDMI bus structure is no good when you have a cache.
    The cache adds one waitstate to accesses so when you have a cache miss your
    performance drops to half
    compared to a non cache solution..


    --
    Best Regards,
    Ulf Samuelsson
    This is a personal view which may or may not be
    share by my Employer Atmel Nordic AB
    Ulf Samuelsson, Feb 5, 2004
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. signmeuptoo
    Replies:
    2
    Views:
    271
  2. Visual

    Detailed bios changelog needed

    Visual, Nov 2, 2004, in forum: Gigabyte
    Replies:
    3
    Views:
    342
    Visual
    Nov 4, 2004
  3. Sudhakar Govindavajhala

    motherboard detailed specs

    Sudhakar Govindavajhala, Dec 2, 2003, in forum: MSI
    Replies:
    0
    Views:
    519
    Sudhakar Govindavajhala
    Dec 2, 2003
  4. Anthony J. Bertorelli

    More detailed description of SATA problem

    Anthony J. Bertorelli, Aug 19, 2004, in forum: Soyo
    Replies:
    5
    Views:
    363
    Anthony J. Bertorelli
    Aug 20, 2004
  5. Howard Nelson

    Detailed Dell info

    Howard Nelson, Jun 29, 2005, in forum: Dell
    Replies:
    2
    Views:
    380
    snert
    Jun 29, 2005
Loading...

Share This Page