1. This forum section is a read-only archive which contains old newsgroup posts. If you wish to post a query, please do so in one of our main forum sections (here). This way you will get a faster, better response from the members on Motherboard Point.

Re: What is happening to Atmel EEPROMs?

Discussion in 'Embedded' started by Bob Eld, Mar 25, 2010.

  1. Bob Eld

    Bob Eld Guest

    "Peter" <> wrote in message
    news:...
    > They have doubled their prices and the lead times are 18 weeks.
    >
    > Yet, others are making them OK.
    >
    > Are Atmel trying to get out of the business?
    > x----------x


    At one point Microchip was trying to buy Atmel. What ever happened to that?
     
    Bob Eld, Mar 25, 2010
    #1
    1. Advertising

  2. Spehro Pefhany, Mar 25, 2010
    #2
    1. Advertising

  3. Bob Eld

    TheM Guest

    "Spehro Pefhany" <> wrote in message news:...
    > On Thu, 25 Mar 2010 13:19:46 -0800, "Bob Eld" <>
    > wrote:
    >
    >>
    >>"Peter" <> wrote in message
    >>news:...
    >>> They have doubled their prices and the lead times are 18 weeks.


    Is this limited to EEPROM/Memory only or uCPU as well?

    Definitely worth considering getting out of AVR.
    Do NPX ARM come with on-chip FLASH?

    M
     
    TheM, Mar 26, 2010
    #3
  4. Bob Eld

    Nico Coesel Guest

    "TheM" <> wrote:

    >"Spehro Pefhany" <> wrote in message news:...
    >> On Thu, 25 Mar 2010 13:19:46 -0800, "Bob Eld" <>
    >> wrote:
    >>
    >>>
    >>>"Peter" <> wrote in message
    >>>news:...
    >>>> They have doubled their prices and the lead times are 18 weeks.

    >
    >Is this limited to EEPROM/Memory only or uCPU as well?
    >
    >Definitely worth considering getting out of AVR.
    >Do NPX ARM come with on-chip FLASH?


    Yes, all of them have 128 bit wide flash that allows zero waitstate
    execution at the maximum CPU clock.

    --
    Failure does not prove something is impossible, failure simply
    indicates you are not using the right tools...
    nico@nctdevpuntnl (punt=.)
    --------------------------------------------------------------
     
    Nico Coesel, Mar 26, 2010
    #4
  5. Bob Eld

    TheM Guest

    "Nico Coesel" <> wrote in message news:...
    > "TheM" <> wrote:
    >
    >>"Spehro Pefhany" <> wrote in message news:...
    >>> On Thu, 25 Mar 2010 13:19:46 -0800, "Bob Eld" <>
    >>> wrote:
    >>>
    >>>>
    >>>>"Peter" <> wrote in message
    >>>>news:...
    >>>>> They have doubled their prices and the lead times are 18 weeks.

    >>
    >>Is this limited to EEPROM/Memory only or uCPU as well?
    >>
    >>Definitely worth considering getting out of AVR.
    >>Do NPX ARM come with on-chip FLASH?

    >
    > Yes, all of them have 128 bit wide flash that allows zero waitstate
    > execution at the maximum CPU clock.


    Not bad, I ordered a couple books on ARM off Amazon, may get into it finally.
    From what I see they are same price as AVR mega, low power and much faster.
    And NXP is very generous with samples.

    M
     
    TheM, Mar 26, 2010
    #5
  6. On Mar 26, 11:55 am, Peter <> wrote:
    >  "TheM" <> wrote:
    > >"Spehro Pefhany" <> wrote in messagenews:...
    > >> On Thu, 25 Mar 2010 13:19:46 -0800, "Bob Eld" <>
    > >> wrote:

    >
    > >>>"Peter" <> wrote in message
    > >>>news:...
    > >>>> They have doubled their prices and the lead times are 18 weeks.

    >
    > >Is this limited to EEPROM/Memory only or uCPU as well?

    >
    > >Definitely worth considering getting out of AVR.
    > >Do NPX ARM come with on-chip FLASH?

    >
    > >M

    >
    > I simply cannot believe Atmel are going to drop all the AVR users in
    > the sh*t.
    >
    > I've been in electronics design and manufacturing since the mid 1970s
    > and have seen these "crises" so many times.
    >
    > At one time we used to buy a 74LS245 for 20 pence and months later
    > they were £2.50 - a 12x rise.
    >
    > How did this happen? Did the whole world suddenly want a 74LS245?
    >
    > No.
    >
    > What happened is that there was an over-supply of 74LS (following,
    > guess what, a previous price bump) and the prices plummetted. So the
    > distis, being cynical bastards, sent out their sales reps to spread
    > stories of "74LS going on allocation".
    >
    > "Allocation" is the word every buyer dreads because it means you don't
    > get a lead time quoted, so basically you have to massively over-order,
    > with several distis at the same time.
    >
    > The stock of course arrives, some months later, and then you are
    > over-stocked for a few years ;)
    >
    > And the cycle repeats but the cynical bastard salesmen collected their
    > commissions, left those companies, and are now marketing managers ;)
    > So they never face the music.
    >
    > Currently, there is a lot of crap being spread around about Allocation
    > yet again, and sure enough people are starting to buy into it, and
    > lead times are growing.
    >
    > However, interestingly, we are likely to end up in a situation where
    > our main products will be in two versions, one using the Hitachi
    > H8/323 and the other using an Atmega128 (or whatever), but externally
    > they will be exactly identical functionally. I have 5-10 year stock of
    > the H8 in a safe, and will keep a similar stock of the ATmega.
    >
    > x----------x




    Was that back when supposedly the encapsulation epoxy plant burned
    down? Maybe that was different 'crisis'. lol!
     
    1 Lucky Texan, Mar 26, 2010
    #6
  7. Bob Eld

    TheM Guest

    "1 Lucky Texan" <> wrote in message news:...
    On Mar 26, 11:55 am, Peter <> wrote:

    >> I've been in electronics design and manufacturing since the mid 1970s
    >> and have seen these "crises" so many times.
    >>
    >> At one time we used to buy a 74LS245 for 20 pence and months later
    >> they were £2.50 - a 12x rise.
    >>
    >> How did this happen? Did the whole world suddenly want a 74LS245?
    >>
    >> No.
    >>
    >> What happened is that there was an over-supply of 74LS (following,
    >> guess what, a previous price bump) and the prices plummetted. So the
    >> distis, being cynical bastards, sent out their sales reps to spread
    >> stories of "74LS going on allocation".
    >>
    >> "Allocation" is the word every buyer dreads because it means you don't
    >> get a lead time quoted, so basically you have to massively over-order,
    >> with several distis at the same time.
    >>
    >> The stock of course arrives, some months later, and then you are
    >> over-stocked for a few years ;)
    >>
    >> And the cycle repeats but the cynical bastard salesmen collected their
    >> commissions, left those companies, and are now marketing managers ;)
    >> So they never face the music.
    >>
    >> Currently, there is a lot of crap being spread around about Allocation
    >> yet again, and sure enough people are starting to buy into it, and
    >> lead times are growing.


    >Was that back when supposedly the encapsulation epoxy plant burned
    >down? Maybe that was different 'crisis'. lol!


    I remember RAM becoming more expensive after each major earthquake
    in Asia some years ago.

    M
     
    TheM, Mar 26, 2010
    #7
  8. Bob Eld

    Peter Guest

    1 Lucky Texan <> wrote

    >Was that back when supposedly the encapsulation epoxy plant burned
    >down? Maybe that was different 'crisis'. lol!


    I recall that !!

    Yeah, very likely.

    A Jap earthquake was another good one for lead times. The distis jump
    on anything.
     
    Peter, Mar 26, 2010
    #8
  9. Bob Eld

    Nico Coesel Guest

    "TheM" <> wrote:

    >"Nico Coesel" <> wrote in message news:...
    >> "TheM" <> wrote:
    >>
    >>>"Spehro Pefhany" <> wrote in message news:...
    >>>> On Thu, 25 Mar 2010 13:19:46 -0800, "Bob Eld" <>
    >>>> wrote:
    >>>>
    >>>>>
    >>>>>"Peter" <> wrote in message
    >>>>>news:...
    >>>>>> They have doubled their prices and the lead times are 18 weeks.
    >>>
    >>>Is this limited to EEPROM/Memory only or uCPU as well?
    >>>
    >>>Definitely worth considering getting out of AVR.
    >>>Do NPX ARM come with on-chip FLASH?

    >>
    >> Yes, all of them have 128 bit wide flash that allows zero waitstate
    >> execution at the maximum CPU clock.

    >
    >Not bad, I ordered a couple books on ARM off Amazon, may get into it finally.
    >From what I see they are same price as AVR mega, low power and much faster.
    >And NXP is very generous with samples.


    The books on ARM may be too generic. Most of the things you need to
    know are in the user manual and the datasheet. NXP's Cortex based
    LPC1000 series need no assembly at all to get running. Even interrupt
    routines do not need special care.

    --
    Failure does not prove something is impossible, failure simply
    indicates you are not using the right tools...
    nico@nctdevpuntnl (punt=.)
    --------------------------------------------------------------
     
    Nico Coesel, Mar 26, 2010
    #9
  10. TheM skrev:
    > "Nico Coesel" <> wrote in message news:...
    >> "TheM" <> wrote:
    >>
    >>> "Spehro Pefhany" <> wrote in message news:...
    >>>> On Thu, 25 Mar 2010 13:19:46 -0800, "Bob Eld" <>
    >>>> wrote:
    >>>>
    >>>>> "Peter" <> wrote in message
    >>>>> news:...
    >>>>>> They have doubled their prices and the lead times are 18 weeks.
    >>> Is this limited to EEPROM/Memory only or uCPU as well?
    >>>
    >>> Definitely worth considering getting out of AVR.
    >>> Do NPX ARM come with on-chip FLASH?

    >> Yes, all of them have 128 bit wide flash that allows zero waitstate
    >> execution at the maximum CPU clock.

    >
    > Not bad, I ordered a couple books on ARM off Amazon, may get into it finally.
    > From what I see they are same price as AVR mega, low power and much faster.
    > And NXP is very generous with samples.
    >
    > M
    >
    >


    The typical 32 bitters of today are implemented using advanced
    flash technologies which allows high density memories in small chip
    areas, but they are not low power.

    The inherent properties of the process makes for high leakage.
    When you see power consumption in sleep of around 1-2 uA,
    this is when the chip is turned OFF.
    Only a small part of the chip is powered, RTC and a few other things.

    When you implement in a 0.25u process or higher, you can have the chip
    fully initialized and ready to react on input while using
    1-2 uA in sleep.

    That is a big difference.

    While the NXP devices gets zero waitstate from 128 bit bus,
    this also makes them extremely power hungry.
    An LPC ARM7 uses about 2 x the current of a SAM7.
    It gets higher performance in ARM mode.

    The ARM mode has a price in code size, so if you want more features,
    then you better run in Thumb mode. The SAM7 with 32 bit flash is
    actually faster than the LPC when running in Thumb mode,
    (at the same frequency) since the SAM7 uses as 33 MHz flash,
    while the LPC uses a 24 Mhz flash.
    In thumb mode, the 32 bit access gives you two instructions
    per cycle so in average this gives you 1 instruction per clock on the SAM7.

    Less waitstates means higher performance.
    By copying a few 32 bit ARM routines to SRAM,
    you can overcome that limitation.
    You can get slightly higher top frequency out of the LPC,
    but that again increases the power consumption.


    For Cortex-M3 I did some test on the new SAM3, which can be
    configured to use both 64 bit or 128 bit memories.
    With a 128 bit memory, you can wring about 5% extra performance
    out of the chip compared to 64 bit operation.
    From a power consumption point of view it is probably better
    to increase the clock frequency by 5% than to enable the 128 bit mode.
    It is therefore only the most demanding applications that have
    any use for the 128 bit memory.

    Testing on other Cortex-M3 chips indicate similar results.

    Someone told me that they tried executing out of SRAM on an STM32
    and this was actually slower than executing out of flash.
    Executing out of external memory also appears to be a problem,
    since there is no cache/burst and bandwidth seems to be lower
    than equivalent ARM7 devices.

    Current guess is that the AHB bus has some delays due to
    synchronization. Also if you execute out of SRAM
    you are going to have conflicts with data access.
    Something which is avoided when you execute out of flash.


    Would be curious to hear about other peoples experience about this.

    Best Regards
    Ulf Samuelsson
     
    Ulf Samuelsson, Mar 27, 2010
    #10
  11. Bob Eld

    Guest

    On 27 Mar., 01:02, Ulf Samuelsson <> wrote:
    > TheM skrev:
    >
    >
    >
    > > "Nico Coesel" <> wrote in messagenews:...
    > >> "TheM" <> wrote:

    >
    > >>> "Spehro Pefhany" <> wrote in messagenews:...
    > >>>> On Thu, 25 Mar 2010 13:19:46 -0800, "Bob Eld" <>
    > >>>> wrote:

    >
    > >>>>> "Peter" <> wrote in message
    > >>>>>news:...
    > >>>>>> They have doubled their prices and the lead times are 18 weeks.
    > >>> Is this limited to EEPROM/Memory only or uCPU as well?

    >
    > >>> Definitely worth considering getting out of AVR.
    > >>> Do NPX ARM come with on-chip FLASH?
    > >> Yes, all of them have 128 bit wide flash that allows zero waitstate
    > >> execution at the maximum CPU clock.

    >
    > > Not bad, I ordered a couple books on ARM off Amazon, may get into it finally.
    > > From what I see they are same price as AVR mega, low power and much faster.
    > > And NXP is very generous with samples.

    >
    > > M

    >
    > The typical 32 bitters of today are implemented using advanced
    > flash technologies which allows high density memories in small chip
    > areas, but they are not low power.
    >
    > The inherent properties of the process makes for high leakage.
    > When you see power consumption in sleep of around 1-2 uA,
    > this is when the chip is turned OFF.
    > Only a small part of the chip is powered, RTC and a few other things.
    >
    > When you implement in a 0.25u process or higher, you can have the chip
    > fully initialized and ready to react on input while using
    > 1-2 uA in sleep.
    >
    > That is a big difference.
    >
    > While the NXP devices gets zero waitstate from 128 bit bus,
    > this also makes them extremely power hungry.
    > An LPC ARM7 uses about 2 x the current of a SAM7.
    > It gets higher performance in ARM mode.
    >
    > The ARM mode has a price in code size, so if you want more features,
    > then you better run in Thumb mode. The SAM7 with 32 bit flash is
    > actually faster than the LPC when running in Thumb mode,
    > (at the same frequency) since the SAM7 uses as 33 MHz flash,
    > while the LPC uses a 24 Mhz flash.
    > In thumb mode, the 32 bit access gives you two instructions
    > per cycle so in average this gives you 1 instruction per clock on the SAM7.
    >


    how does that make any sense? wheter an instruction is 16 or 32bit,
    24MHz * 128bit is still more that 33MHz * 32 bit ...

    snip

    >
    > Best Regards
    > Ulf Samuelsson


    -Lasse
     
    , Mar 27, 2010
    #11
  12. skrev:
    > On 27 Mar., 01:02, Ulf Samuelsson <> wrote:
    >> TheM skrev:
    >>
    >>
    >>
    >>> "Nico Coesel" <> wrote in messagenews:...
    >>>> "TheM" <> wrote:
    >>>>> "Spehro Pefhany" <> wrote in messagenews:...
    >>>>>> On Thu, 25 Mar 2010 13:19:46 -0800, "Bob Eld" <>
    >>>>>> wrote:
    >>>>>>> "Peter" <> wrote in message
    >>>>>>> news:...
    >>>>>>>> They have doubled their prices and the lead times are 18 weeks.
    >>>>> Is this limited to EEPROM/Memory only or uCPU as well?
    >>>>> Definitely worth considering getting out of AVR.
    >>>>> Do NPX ARM come with on-chip FLASH?
    >>>> Yes, all of them have 128 bit wide flash that allows zero waitstate
    >>>> execution at the maximum CPU clock.
    >>> Not bad, I ordered a couple books on ARM off Amazon, may get into it finally.
    >>> From what I see they are same price as AVR mega, low power and much faster.
    >>> And NXP is very generous with samples.
    >>> M

    >> The typical 32 bitters of today are implemented using advanced
    >> flash technologies which allows high density memories in small chip
    >> areas, but they are not low power.
    >>
    >> The inherent properties of the process makes for high leakage.
    >> When you see power consumption in sleep of around 1-2 uA,
    >> this is when the chip is turned OFF.
    >> Only a small part of the chip is powered, RTC and a few other things.
    >>
    >> When you implement in a 0.25u process or higher, you can have the chip
    >> fully initialized and ready to react on input while using
    >> 1-2 uA in sleep.
    >>
    >> That is a big difference.
    >>
    >> While the NXP devices gets zero waitstate from 128 bit bus,
    >> this also makes them extremely power hungry.
    >> An LPC ARM7 uses about 2 x the current of a SAM7.
    >> It gets higher performance in ARM mode.
    >>
    >> The ARM mode has a price in code size, so if you want more features,
    >> then you better run in Thumb mode. The SAM7 with 32 bit flash is
    >> actually faster than the LPC when running in Thumb mode,
    >> (at the same frequency) since the SAM7 uses as 33 MHz flash,
    >> while the LPC uses a 24 Mhz flash.
    >> In thumb mode, the 32 bit access gives you two instructions
    >> per cycle so in average this gives you 1 instruction per clock on the SAM7.
    >>

    >
    > how does that make any sense? wheter an instruction is 16 or 32bit,
    > 24MHz * 128bit is still more that 33MHz * 32 bit ...
    >

    When you run in Thumb mode and 1 waitstate, all instructions
    are 16 bit and the SAM7 memory controller fetches 32 bit
    so and with prefetch, there should always be zero waitstates
    for sequential fetch.

    For Thumb mode, you have several cases depending on processor speed.
    Figures are for (non-sequential/sequential) access.

    LPC SAM7
    < 24 MHz: 0/0 0/0 same speed
    24-33 MHz: 1/0 0/0 (SAM7 faster)
    33-48 MHz: 1/0 1/0 same speed
    48-66 MHz 2/0 1/0 (SAM7 faster)

    so the LPC2xxx has to run at higher clock frequencies
    to meet the SAM7S performance.
    The 128 bit memory is overkill for thumb mode and just
    wastes power.

    You really need to run ARM mode for the 128 bit memory
    to make sense.

    You can try overclocking the SAM7S if you are not running
    over the full temp range.
    48 MHz zero waitstates seems to work OK, but not up to +85'C.



    > snip
    >
    >> Best Regards
    >> Ulf Samuelsson

    >
    > -Lasse
     
    Ulf Samuelsson, Mar 27, 2010
    #12
  13. Bob Eld

    Jon Kirwan Guest

    On Sat, 27 Mar 2010 08:15:03 +0100, Ulf Samuelsson
    <> wrote:

    ><snip of LPC2xxx vx SAM7S discussion>
    >The 128 bit memory is overkill for thumb mode and just
    >wastes power.
    ><snip>


    Ulf, let me remind you of something you wrote about the SAM7:

    "In thumb mode, the 32 bit access gives you two
    instructions per cycle so in average this gives
    you 1 instruction per clock on the SAM7."

    I gather this is regarding the case where there is 1 wait
    state reading the 32-bit flash line -- so 2 clocks per line
    and thus the 1 clock per 16-bit instruction (assuming it
    executes in 1 clock.)

    Nico's comment about the NPX ARM, about the 128-bit wide
    flash line-width, would (I imagine) work about the same
    except that it reads at full clock rate speeds, no wait
    states. So I gather, if it works similarly, that there are
    eight thumb instructions per line (roughly.) I take it your
    point is that since each instruction (things being equal)
    cannot execute faster than 1 clock per, that it takes 8
    clocks to execute those thumb instructions.

    The discussion could move between discussing instruction
    streams to discussing constant data tables and the like, but
    staying on the subject of instructions for the following....

    So the effect is that it takes the same number of clocks to
    execute 1-clock thumb instructions on either system?
    (Ignoring frequency, for now.) Or do I get that wrong?

    You then discussed power consumption issues. Wouldn't it be
    the case that since the NPX ARM is accessing its flash at a
    1/8th clock rate and the SAM7 is constantly operating its
    flash that the _average_ power consumption might very well be
    better with the NPX ARM, despite somewhat higher current when
    it is being accessed? Isn't the fact that the access cycle
    takes place far less frequently observed as a lower average?
    Perhaps the peak divided by 8, or so? (Again, keep the clock
    rates identical [downgraded to SAM7 rates in the NXP ARM
    case.]) Have you computed figures for both?

    Jon
     
    Jon Kirwan, Mar 27, 2010
    #13
  14. Bob Eld

    Nico Coesel Guest

    Ulf Samuelsson <> wrote:

    >TheM skrev:
    >> "Nico Coesel" <> wrote in message news:...
    >>> "TheM" <> wrote:
    >>>
    >>>> "Spehro Pefhany" <> wrote in message news:...
    >>>>> On Thu, 25 Mar 2010 13:19:46 -0800, "Bob Eld" <>
    >>>>> wrote:
    >>>>>
    >>>>>> "Peter" <> wrote in message
    >>>>>> news:...
    >>>>>>> They have doubled their prices and the lead times are 18 weeks.
    >>>> Is this limited to EEPROM/Memory only or uCPU as well?
    >>>>
    >>>> Definitely worth considering getting out of AVR.
    >>>> Do NPX ARM come with on-chip FLASH?
    >>> Yes, all of them have 128 bit wide flash that allows zero waitstate
    >>> execution at the maximum CPU clock.

    >>
    >> Not bad, I ordered a couple books on ARM off Amazon, may get into it finally.
    >> From what I see they are same price as AVR mega, low power and much faster.
    >> And NXP is very generous with samples.
    >>
    >> M
    >>
    >>

    >
    >The typical 32 bitters of today are implemented using advanced
    >flash technologies which allows high density memories in small chip
    >areas, but they are not low power.
    >
    >The inherent properties of the process makes for high leakage.
    >When you see power consumption in sleep of around 1-2 uA,
    >this is when the chip is turned OFF.
    >Only a small part of the chip is powered, RTC and a few other things.
    >
    >When you implement in a 0.25u process or higher, you can have the chip
    >fully initialized and ready to react on input while using
    >1-2 uA in sleep.
    >
    >That is a big difference.
    >
    >While the NXP devices gets zero waitstate from 128 bit bus,
    >this also makes them extremely power hungry.
    >An LPC ARM7 uses about 2 x the current of a SAM7.
    >It gets higher performance in ARM mode.
    >
    >The ARM mode has a price in code size, so if you want more features,
    >then you better run in Thumb mode. The SAM7 with 32 bit flash is
    >actually faster than the LPC when running in Thumb mode,
    >(at the same frequency) since the SAM7 uses as 33 MHz flash,
    >while the LPC uses a 24 Mhz flash.
    >In thumb mode, the 32 bit access gives you two instructions
    >per cycle so in average this gives you 1 instruction per clock on the SAM7.


    I think this depends a lot on what method you use to measure this.
    Thumb code is expected to be slower than ARM code. You should test
    with drystone and make sure the same C library is used since drystone
    results also depend on the C library!

    >Less waitstates means higher performance.
    >By copying a few 32 bit ARM routines to SRAM,
    >you can overcome that limitation.
    >You can get slightly higher top frequency out of the LPC,
    >but that again increases the power consumption.
    >
    >
    >For Cortex-M3 I did some test on the new SAM3, which can be
    >configured to use both 64 bit or 128 bit memories.
    >With a 128 bit memory, you can wring about 5% extra performance
    >out of the chip compared to 64 bit operation.
    >From a power consumption point of view it is probably better
    >to increase the clock frequency by 5% than to enable the 128 bit mode.
    >It is therefore only the most demanding applications that have
    >any use for the 128 bit memory.
    >
    >Testing on other Cortex-M3 chips indicate similar results.
    >
    >Someone told me that they tried executing out of SRAM on an STM32
    >and this was actually slower than executing out of flash.
    >Executing out of external memory also appears to be a problem,
    >since there is no cache/burst and bandwidth seems to be lower
    >than equivalent ARM7 devices.


    That doesn't surprise me. From my experience with STR7 and the STM32
    datasheets it seems ST does a sloppy job putting controllers together.
    They are cheap but you don't get maximum performance.

    >Current guess is that the AHB bus has some delays due to
    >synchronization. Also if you execute out of SRAM
    >you are going to have conflicts with data access.
    >Something which is avoided when you execute out of flash.


    NXP has some sort of cache between the CPU and the flash on the M3
    devices. According to the documentation NXP's LPC1700 M3 devices use a
    Harvard architecture with 3 busses so multiple data transfers
    (CPU-flash, CPU-memory and DMA) can occur simultaneously. Executing
    from RAM would occupy one bus so you'll have less memory bandwidth to
    work with.

    --
    Failure does not prove something is impossible, failure simply
    indicates you are not using the right tools...
    nico@nctdevpuntnl (punt=.)
    --------------------------------------------------------------
     
    Nico Coesel, Mar 27, 2010
    #14
  15. Jon Kirwan skrev:
    > On Sat, 27 Mar 2010 08:15:03 +0100, Ulf Samuelsson
    > <> wrote:
    >
    >> <snip of LPC2xxx vx SAM7S discussion>
    >> The 128 bit memory is overkill for thumb mode and just
    >> wastes power.
    >> <snip>

    >
    > Ulf, let me remind you of something you wrote about the SAM7:
    >
    > "In thumb mode, the 32 bit access gives you two
    > instructions per cycle so in average this gives
    > you 1 instruction per clock on the SAM7."
    >
    > I gather this is regarding the case where there is 1 wait
    > state reading the 32-bit flash line -- so 2 clocks per line
    > and thus the 1 clock per 16-bit instruction (assuming it
    > executes in 1 clock.)
    >
    > Nico's comment about the NPX ARM, about the 128-bit wide
    > flash line-width, would (I imagine) work about the same
    > except that it reads at full clock rate speeds, no wait
    > states. So I gather, if it works similarly, that there are
    > eight thumb instructions per line (roughly.) I take it your
    > point is that since each instruction (things being equal)
    > cannot execute faster than 1 clock per, that it takes 8
    > clocks to execute those thumb instructions.
    >


    Yes, the SAM7 is very nicely tuned to thumb mode.
    The LPC2 provides much more bandwidth than is needed
    when you run in thumb mode.
    Due to the higher latency for the LPC, due to slower
    flash, the SAM7 will be better at certain frequencies,
    but the LPC will have a higher max clock frequency.

    The real point is that you are not neccessarily
    faster because you have a wide memory.
    The speed of the memory counts as well.
    There are a lot of parameters to take into account
    if you want to get find the best part.

    People with different requirements will find different
    parts to be the best.

    If you start to use high speed communications, then the
    PDC of the SAM7 serial ports tend to even out any
    difference in performance vs the LPC very quickly.


    > The discussion could move between discussing instruction
    > streams to discussing constant data tables and the like, but
    > staying on the subject of instructions for the following....


    Yes, this will have an effect.
    Accessing a random word should be faster on the SAM7
    and, assuming you copy sequentially a large area
    having 128 bit memory will be beneficial.

    >
    > So the effect is that it takes the same number of clocks to
    > execute 1-clock thumb instructions on either system?
    > (Ignoring frequency, for now.) Or do I get that wrong?


    Yes, the LPC will in certain frequencies hjave longer latency
    so it will be marginally slower in thumb mode.

    >
    > You then discussed power consumption issues. Wouldn't it be
    > the case that since the NPX ARM is accessing its flash at a
    > 1/8th clock rate and the SAM7 is constantly operating its
    > flash that the _average_ power consumption might very well be
    > better with the NPX ARM, despite somewhat higher current when
    > it is being accessed? Isn't the fact that the access cycle
    > takes place far less frequently observed as a lower average?


    As far as I understand the chip select for the internal flash
    is always active when you run at higher frequencies
    so there is a lot of wasted power.

    > Perhaps the peak divided by 8, or so? (Again, keep the clock
    > rates identical [downgraded to SAM7 rates in the NXP ARM
    > case.]) Have you computed figures for both?


    Best is to check the datasheet.
    The CPU core used is another important parameter.
    The SAM7S uses the ARM7TDMI while most other uses the ARM7TDMI-S
    (S = synthesizable) which inherently has 33 % higher power consumption.



    >
    > Jon
     
    Ulf Samuelsson, Mar 27, 2010
    #15
  16. Nico Coesel skrev:
    > Ulf Samuelsson <> wrote:
    >
    >> TheM skrev:
    >>> "Nico Coesel" <> wrote in message news:...
    >>>> "TheM" <> wrote:
    >>>>
    >>>>> "Spehro Pefhany" <> wrote in message news:...
    >>>>>> On Thu, 25 Mar 2010 13:19:46 -0800, "Bob Eld" <>
    >>>>>> wrote:
    >>>>>>
    >>>>>>> "Peter" <> wrote in message
    >>>>>>> news:...
    >>>>>>>> They have doubled their prices and the lead times are 18 weeks.
    >>>>> Is this limited to EEPROM/Memory only or uCPU as well?
    >>>>>
    >>>>> Definitely worth considering getting out of AVR.
    >>>>> Do NPX ARM come with on-chip FLASH?
    >>>> Yes, all of them have 128 bit wide flash that allows zero waitstate
    >>>> execution at the maximum CPU clock.
    >>> Not bad, I ordered a couple books on ARM off Amazon, may get into it finally.
    >>> From what I see they are same price as AVR mega, low power and much faster.
    >>> And NXP is very generous with samples.
    >>>
    >>> M
    >>>
    >>>

    >> The typical 32 bitters of today are implemented using advanced
    >> flash technologies which allows high density memories in small chip
    >> areas, but they are not low power.
    >>
    >> The inherent properties of the process makes for high leakage.
    >> When you see power consumption in sleep of around 1-2 uA,
    >> this is when the chip is turned OFF.
    >> Only a small part of the chip is powered, RTC and a few other things.
    >>
    >> When you implement in a 0.25u process or higher, you can have the chip
    >> fully initialized and ready to react on input while using
    >> 1-2 uA in sleep.
    >>
    >> That is a big difference.
    >>
    >> While the NXP devices gets zero waitstate from 128 bit bus,
    >> this also makes them extremely power hungry.
    >> An LPC ARM7 uses about 2 x the current of a SAM7.
    >> It gets higher performance in ARM mode.
    >>
    >> The ARM mode has a price in code size, so if you want more features,
    >> then you better run in Thumb mode. The SAM7 with 32 bit flash is
    >> actually faster than the LPC when running in Thumb mode,
    >> (at the same frequency) since the SAM7 uses as 33 MHz flash,
    >> while the LPC uses a 24 Mhz flash.
    >> In thumb mode, the 32 bit access gives you two instructions
    >> per cycle so in average this gives you 1 instruction per clock on the SAM7.

    >
    > I think this depends a lot on what method you use to measure this.
    > Thumb code is expected to be slower than ARM code. You should test
    > with drystone and make sure the same C library is used since drystone
    > results also depend on the C library!


    It is pretty clear, that if you
    * execute out of flash in thumb mode
    * do not access flash for data transfers
    * run the chips at equivalent frequencies
    * run sequential fetch at zero waitstates.

    the difference will be the number of waitstates in non-sequential fetch.



    >
    >> Less waitstates means higher performance.
    >> By copying a few 32 bit ARM routines to SRAM,
    >> you can overcome that limitation.
    >> You can get slightly higher top frequency out of the LPC,
    >> but that again increases the power consumption.
    >>
    >>
    >> For Cortex-M3 I did some test on the new SAM3, which can be
    >> configured to use both 64 bit or 128 bit memories.
    >> With a 128 bit memory, you can wring about 5% extra performance
    >> out of the chip compared to 64 bit operation.
    >>From a power consumption point of view it is probably better
    >> to increase the clock frequency by 5% than to enable the 128 bit mode.
    >> It is therefore only the most demanding applications that have
    >> any use for the 128 bit memory.
    >>
    >> Testing on other Cortex-M3 chips indicate similar results.
    >>
    >> Someone told me that they tried executing out of SRAM on an STM32
    >> and this was actually slower than executing out of flash.
    >> Executing out of external memory also appears to be a problem,
    >> since there is no cache/burst and bandwidth seems to be lower
    >> than equivalent ARM7 devices.

    >
    > That doesn't surprise me. From my experience with STR7 and the STM32
    > datasheets it seems ST does a sloppy job putting controllers together.
    > They are cheap but you don't get maximum performance.
    >
    >> Current guess is that the AHB bus has some delays due to
    >> synchronization. Also if you execute out of SRAM
    >> you are going to have conflicts with data access.
    >> Something which is avoided when you execute out of flash.

    >
    > NXP has some sort of cache between the CPU and the flash on the M3
    > devices. According to the documentation NXP's LPC1700 M3 devices use a
    > Harvard architecture with 3 busses so multiple data transfers
    > (CPU-flash, CPU-memory and DMA) can occur simultaneously. Executing
    > from RAM would occupy one bus so you'll have less memory bandwidth to
    > work with.
    >


    The SAM3 uses the same AHB bus as the ARM9.
    The "bus" is actually a series of multiplexers where each target
    has a multiplexer with an input for each bus master.

    As long as noone else wants to access the same target,
    a bus master will get unrestricted access.

    If you execute from flash, you will get full access for the instruction
    bus, (with the exception for the few constants).
    If you execute out of a single SRAM, you have to share access
    with the data transfers, which will slow you down.

    BR
    Ulf Samuelsson
     
    Ulf Samuelsson, Mar 27, 2010
    #16
  17. Bob Eld

    Jon Kirwan Guest

    On Sat, 27 Mar 2010 14:14:58 +0100, Ulf Samuelsson
    <> wrote:

    >Jon Kirwan skrev:
    >> On Sat, 27 Mar 2010 08:15:03 +0100, Ulf Samuelsson
    >> <> wrote:
    >>
    >>> <snip of LPC2xxx vx SAM7S discussion>
    >>> The 128 bit memory is overkill for thumb mode and just
    >>> wastes power.
    >>> <snip>

    >>
    >> Ulf, let me remind you of something you wrote about the SAM7:
    >>
    >> "In thumb mode, the 32 bit access gives you two
    >> instructions per cycle so in average this gives
    >> you 1 instruction per clock on the SAM7."
    >>
    >> I gather this is regarding the case where there is 1 wait
    >> state reading the 32-bit flash line -- so 2 clocks per line
    >> and thus the 1 clock per 16-bit instruction (assuming it
    >> executes in 1 clock.)
    >>
    >> Nico's comment about the NPX ARM, about the 128-bit wide
    >> flash line-width, would (I imagine) work about the same
    >> except that it reads at full clock rate speeds, no wait
    >> states. So I gather, if it works similarly, that there are
    >> eight thumb instructions per line (roughly.) I take it your
    >> point is that since each instruction (things being equal)
    >> cannot execute faster than 1 clock per, that it takes 8
    >> clocks to execute those thumb instructions.

    >
    >Yes, the SAM7 is very nicely tuned to thumb mode.
    >The LPC2 provides much more bandwidth than is needed
    >when you run in thumb mode.


    I think I gathered that much and didn't disagree, just
    wondered.

    >Due to the higher latency for the LPC, due to slower
    >flash, the SAM7 will be better at certain frequencies,
    >but the LPC will have a higher max clock frequency.


    I remember you writing that "SAM7 uses a 33 MHz flash, while
    the LPC uses a 24 Mhz flash." It seems hard to imagine,
    though, except perhaps for data fetch situations or
    branching, it being actually slower. If it fetches something
    like 8 thumb instructions at a time, anyway. As another
    poster pointed out, the effective rate is much higher for
    sequential reads no matter how you look at it. So it would
    take branching or non-sequential data fetches to highlight
    the difference.

    One would have to do an exhaustive, stochastic analysis of
    application spaces to get a good bead on all this. But
    ignorant of the details as I truly am right now, not having a
    particular application in mind and just guessing where I'd
    put my money if betting one way or another, I'd put it on 384
    mb/sec memory over 132 mb/sec memory for net throughput.

    >The real point is that you are not neccessarily
    >faster


    Yes, but the key here is the careful "not necessarily"
    wording. Not necessarily, is true enough, as one could form
    specific circumstances where you'd be right. But it seems to
    me they'd be more your 'corner cases' than 'run of the mill.'

    >because you have a wide memory.
    >The speed of the memory counts as well.


    Of course. So people who seem to care about the final speed
    and little else should indeed do some analysis before
    deciding. But if they don't know their application well
    enough to make that comparison... hmm.

    >There are a lot of parameters to take into account
    >if you want to get find the best part.


    Yes. That seems to ever be true!

    >People with different requirements will find different
    >parts to be the best.


    Yes, no argument. I was merely curious about something else
    which you mostly didn't answer, so I suppose if I care enough
    I will have to go find out on my own.... see below.

    >If you start to use high speed communications, then the
    >PDC of the SAM7 serial ports tend to even out any
    >difference in performance vs the LPC very quickly.


    Some parts have such wonderfully sophisticated peripherals.
    Some of these are almost ancient (68332, for example.) So
    it's not only a feature of new parts, either. Which goes
    back to your point that there are a lot of parameters to take
    into account, I suppose.

    >> The discussion could move between discussing instruction
    >> streams to discussing constant data tables and the like, but
    >> staying on the subject of instructions for the following....

    >
    >Yes, this will have an effect.
    >Accessing a random word should be faster on the SAM7
    >and, assuming you copy sequentially a large area
    >having 128 bit memory will be beneficial.


    The 'random' part being important here. In some cases, that
    may be important where the structures are 'const' and can be
    stored in flash and are accessed in a way that cannot take
    advantage of the 128-bit wide lines. A binary search on a
    calibration table with small table entry sizes, perhaps,
    might be a reasonable example that actually occurs often
    enough and may show off your point well. Other examples,
    such as larger element sizes (such as doubles or pairs of
    doubles) for that binary search or a FIR filter table used
    sequentially, might point the other way.

    >> So the effect is that it takes the same number of clocks to
    >> execute 1-clock thumb instructions on either system?
    >> (Ignoring frequency, for now.) Or do I get that wrong?

    >
    >Yes, the LPC will in certain frequencies hjave longer latency
    >so it will be marginally slower in thumb mode.


    I find this tough to stomach, when talking about instruction
    streams Unless there are lots of branches salted in the mix.
    I know I must have read somewhere someone's analysis of many
    programs and the upshot of this, but I think it was for the
    x86 system and a product of Intel's research department some
    years ago and I've no idea how well that applies to the ARM
    core. I'm sure someone (perhaps you?) has access to such
    anaylses and might share it here?

    >> You then discussed power consumption issues. Wouldn't it be
    >> the case that since the NPX ARM is accessing its flash at a
    >> 1/8th clock rate and the SAM7 is constantly operating its
    >> flash that the _average_ power consumption might very well be
    >> better with the NPX ARM, despite somewhat higher current when
    >> it is being accessed? Isn't the fact that the access cycle
    >> takes place far less frequently observed as a lower average?

    >
    >As far as I understand the chip select for the internal flash
    >is always active when you run at higher frequencies
    >so there is a lot of wasted power.


    By "at higher frequencies" do you have a particular number
    above which your comment applies and below which it does not?

    In any case, this is the answer I was looking for and you
    don't appear to answer now. Why would anyone "run the flash"
    when the bus isn't active? It seems.... well, bone-headed.
    And I can't recall any chip design being that poor. I've
    seen cases where an external board design (not done by chip
    designers, but more your hobbyist designer type) that did
    things like that. But it is hard for me to imagine a chip
    designer being that stupid. It's almost zero work to be
    smarter than that.

    So this suggests you want me to go study the situation. Maybe
    someone already knows, though, and can post it. I can hope.

    >> Perhaps the peak divided by 8, or so? (Again, keep the clock
    >> rates identical [downgraded to SAM7 rates in the NXP ARM
    >> case.]) Have you computed figures for both?

    >
    >Best is to check the datasheet.


    I wondered if you already knew the answer. I suppose not,
    now.

    >The CPU core used is another important parameter.
    >The SAM7S uses the ARM7TDMI while most other uses the ARM7TDMI-S
    >(S = synthesizable) which inherently has 33 % higher power consumption.


    I'm aware of the general issue. Your use of "most other"
    does NOT address itself to the subject at hand, though. It
    leaves open either possibility for the LPC2. But it's a
    point worth keeping in mind if you make these chips, I
    suppose. For the rest of us, it's just a matter of deciding
    which works better by examining the data sheet. We don't
    have the option to move a -S design to a crafted ASIC.

    So this leaves some more or less interesting questions.

    (1) Where is a quality report or two on the subject of
    instruction mix for ARM applications, broken down by
    application spaces that differ substantially from each other,
    and what are the results of these studies?

    (2) Does the LPC2 device really operate the flash all the
    time? Or not?

    (3) Is the LPC2 a -S (which doesn't matter that much, but
    since the topic is brought up it might be nice to put that to
    bed?)

    I don't know.

    Jon
     
    Jon Kirwan, Mar 27, 2010
    #17
  18. Jon Kirwan skrev:
    > On Sat, 27 Mar 2010 14:14:58 +0100, Ulf Samuelsson
    > <> wrote:
    >
    >> Jon Kirwan skrev:
    >>> On Sat, 27 Mar 2010 08:15:03 +0100, Ulf Samuelsson
    >>> <> wrote:
    >>>
    >>>> <snip of LPC2xxx vx SAM7S discussion>
    >>>> The 128 bit memory is overkill for thumb mode and just
    >>>> wastes power.
    >>>> <snip>
    >>> Ulf, let me remind you of something you wrote about the SAM7:
    >>>
    >>> "In thumb mode, the 32 bit access gives you two
    >>> instructions per cycle so in average this gives
    >>> you 1 instruction per clock on the SAM7."
    >>>
    >>> I gather this is regarding the case where there is 1 wait
    >>> state reading the 32-bit flash line -- so 2 clocks per line
    >>> and thus the 1 clock per 16-bit instruction (assuming it
    >>> executes in 1 clock.)
    >>>
    >>> Nico's comment about the NPX ARM, about the 128-bit wide
    >>> flash line-width, would (I imagine) work about the same
    >>> except that it reads at full clock rate speeds, no wait
    >>> states. So I gather, if it works similarly, that there are
    >>> eight thumb instructions per line (roughly.) I take it your
    >>> point is that since each instruction (things being equal)
    >>> cannot execute faster than 1 clock per, that it takes 8
    >>> clocks to execute those thumb instructions.

    >> Yes, the SAM7 is very nicely tuned to thumb mode.
    >> The LPC2 provides much more bandwidth than is needed
    >> when you run in thumb mode.

    >
    > I think I gathered that much and didn't disagree, just
    > wondered.
    >
    >> Due to the higher latency for the LPC, due to slower
    >> flash, the SAM7 will be better at certain frequencies,
    >> but the LPC will have a higher max clock frequency.

    >
    > I remember you writing that "SAM7 uses a 33 MHz flash, while
    > the LPC uses a 24 Mhz flash." It seems hard to imagine,
    > though, except perhaps for data fetch situations or
    > branching, it being actually slower. If it fetches something
    > like 8 thumb instructions at a time, anyway. As another
    > poster pointed out, the effective rate is much higher for
    > sequential reads no matter how you look at it. So it would
    > take branching or non-sequential data fetches to highlight
    > the difference.
    >
    > One would have to do an exhaustive, stochastic analysis of
    > application spaces to get a good bead on all this. But
    > ignorant of the details as I truly am right now, not having a
    > particular application in mind and just guessing where I'd
    > put my money if betting one way or another, I'd put it on 384
    > mb/sec memory over 132 mb/sec memory for net throughput.


    That is because you ignore the congestion caused by the fact that the
    ARM7 core only fetches 16 bits per access in thumb mode.
    At 33 MHz, the CPU can only use 66 MB / second,
    At 66 MHz, the CPU can only use 132 MB / second.
    Since you can sustain 132 MB / second with a 33 Mhz 32 bit memory,
    you do not need it to be wider to keep the pipeline running
    at zero waitstates for sequential fetch.
    For non-sequential fetch, the width is not important.
    Only the number of waitstates, and the SAM7 has same or less # of
    waitstates than the LPC.

    ----
    The 128 bit memory is really only useful for ARM mode.
    For thumb mode it is more or less a waste.

    >
    >> The real point is that you are not neccessarily
    >> faster

    >
    > Yes, but the key here is the careful "not necessarily"
    > wording. Not necessarily, is true enough, as one could form
    > specific circumstances where you'd be right. But it seems to
    > me they'd be more your 'corner cases' than 'run of the mill.'


    I dont think running in Thumb mode is a corner case.


    >
    >> because you have a wide memory.
    >> The speed of the memory counts as well.

    >
    > Of course. So people who seem to care about the final speed
    > and little else should indeed do some analysis before
    > deciding. But if they don't know their application well
    > enough to make that comparison... hmm.
    >
    >> There are a lot of parameters to take into account
    >> if you want to get find the best part.

    >
    > Yes. That seems to ever be true!
    >
    >> People with different requirements will find different
    >> parts to be the best.

    >
    > Yes, no argument. I was merely curious about something else
    > which you mostly didn't answer, so I suppose if I care enough
    > I will have to go find out on my own.... see below.
    >
    >> If you start to use high speed communications, then the
    >> PDC of the SAM7 serial ports tend to even out any
    >> difference in performance vs the LPC very quickly.

    >
    > Some parts have such wonderfully sophisticated peripherals.
    > Some of these are almost ancient (68332, for example.) So
    > it's not only a feature of new parts, either. Which goes
    > back to your point that there are a lot of parameters to take
    > into account, I suppose.
    >
    >>> The discussion could move between discussing instruction
    >>> streams to discussing constant data tables and the like, but
    >>> staying on the subject of instructions for the following....

    >> Yes, this will have an effect.
    >> Accessing a random word should be faster on the SAM7
    >> and, assuming you copy sequentially a large area
    >> having 128 bit memory will be beneficial.

    >
    > The 'random' part being important here. In some cases, that
    > may be important where the structures are 'const' and can be
    > stored in flash and are accessed in a way that cannot take
    > advantage of the 128-bit wide lines. A binary search on a
    > calibration table with small table entry sizes, perhaps,
    > might be a reasonable example that actually occurs often
    > enough and may show off your point well. Other examples,
    > such as larger element sizes (such as doubles or pairs of
    > doubles) for that binary search or a FIR filter table used
    > sequentially, might point the other way.
    >
    >>> So the effect is that it takes the same number of clocks to
    >>> execute 1-clock thumb instructions on either system?
    >>> (Ignoring frequency, for now.) Or do I get that wrong?

    >> Yes, the LPC will in certain frequencies hjave longer latency
    >> so it will be marginally slower in thumb mode.

    >
    > I find this tough to stomach, when talking about instruction
    > streams Unless there are lots of branches salted in the mix.
    > I know I must have read somewhere someone's analysis of many
    > programs and the upshot of this, but I think it was for the
    > x86 system and a product of Intel's research department some
    > years ago and I've no idea how well that applies to the ARM
    > core. I'm sure someone (perhaps you?) has access to such
    > anaylses and might share it here?


    LPC with 1 waistates at 33 Mhz.

    NOP 2 (fetches 8 instructions)
    NOP 1
    NOP 1
    NOP 1
    NOP 1
    NOP 1
    NOP 1
    NOP 1
    ..........
    Sum = 9

    Same code with SAM7, 0 waitstate at 33 MHz.

    NOP 1 (fetches 1 instruction)
    NOP 1 (fetches 1 instruction)
    NOP 1 (fetches 1 instruction)
    NOP 1 (fetches 1 instruction)
    NOP 1 (fetches 1 instruction)
    NOP 1 (fetches 1 instruction)
    NOP 1 (fetches 1 instruction)
    NOP 1 (fetches 1 instruction)
    ..........
    Sum = 8

    It should not be to hard to grasp.


    >
    >>> You then discussed power consumption issues. Wouldn't it be
    >>> the case that since the NPX ARM is accessing its flash at a
    >>> 1/8th clock rate and the SAM7 is constantly operating its
    >>> flash that the _average_ power consumption might very well be
    >>> better with the NPX ARM, despite somewhat higher current when
    >>> it is being accessed? Isn't the fact that the access cycle
    >>> takes place far less frequently observed as a lower average?

    >> As far as I understand the chip select for the internal flash
    >> is always active when you run at higher frequencies
    >> so there is a lot of wasted power.

    >
    > By "at higher frequencies" do you have a particular number
    > above which your comment applies and below which it does not?


    Each chip designer makes their own choices.
    I know of some chips starting to strobe the flash
    chip select when below 1 - 4 Mhz


    >
    > In any case, this is the answer I was looking for and you
    > don't appear to answer now. Why would anyone "run the flash"
    > when the bus isn't active? It seems.... well, bone-headed.
    > And I can't recall any chip design being that poor. I've
    > seen cases where an external board design (not done by chip
    > designers, but more your hobbyist designer type) that did
    > things like that. But it is hard for me to imagine a chip
    > designer being that stupid. It's almost zero work to be
    > smarter than that.


    This is an automatic thing which measures the clock frequency
    vs another clock frequency, and the "other" clock frequency
    is often not that quick.


    >
    > So this suggests you want me to go study the situation. Maybe
    > someone already knows, though, and can post it. I can hope.
    >
    >>> Perhaps the peak divided by 8, or so? (Again, keep the clock
    >>> rates identical [downgraded to SAM7 rates in the NXP ARM
    >>> case.]) Have you computed figures for both?

    >> Best is to check the datasheet.

    >
    > I wondered if you already knew the answer. I suppose not,
    > now.


    Looking at the LPC2141 datasheet, which seems to be the part
    closest to the SAM7S256 you get
    57 mA @ 3.3V = 188 mW @ 60 Mhz = 3.135 mW/Mhz.

    The SAM7S datasheet runs 33 mA @ 3.3 V @ 55 MHz = 1.98 mW/Mhz,
    You can, on the SAM7S choose to feed VDDCORE from 1.8V.

    The SAM7S is specified with USB enabled, so this
    has to be used for the LPC as well for a fair comparision.

    >> The CPU core used is another important parameter.
    >> The SAM7S uses the ARM7TDMI while most other uses the ARM7TDMI-S
    >> (S = synthesizable) which inherently has 33 % higher power consumption.

    >
    > I'm aware of the general issue. Your use of "most other"
    > does NOT address itself to the subject at hand, though. It
    > leaves open either possibility for the LPC2. But it's a
    > point worth keeping in mind if you make these chips, I
    > suppose. For the rest of us, it's just a matter of deciding
    > which works better by examining the data sheet. We don't
    > have the option to move a -S design to a crafted ASIC.
    >
    > So this leaves some more or less interesting questions.
    >
    > (1) Where is a quality report or two on the subject of
    > instruction mix for ARM applications, broken down by
    > application spaces that differ substantially from each other,
    > and what are the results of these studies?
    >
    > (2) Does the LPC2 device really operate the flash all the
    > time? Or not?
    >


    You do not have any figures in the datasheet indicating
    low power mode.

    > (3) Is the LPC2 a -S (which doesn't matter that much, but
    > since the topic is brought up it might be nice to put that to
    > bed?)


    Yes it is.
    It should be enough to look in the datasheet.


    > I don't know.
    >
    > Jon


    Ulf
     
    Ulf Samuelsson, Mar 28, 2010
    #18
  19. Bob Eld

    Jon Kirwan Guest

    On Sun, 28 Mar 2010 01:04:20 +0100, Ulf Samuelsson
    <> wrote:

    >Jon Kirwan skrev:
    >> On Sat, 27 Mar 2010 14:14:58 +0100, Ulf Samuelsson
    >> <> wrote:
    >>
    >>> Jon Kirwan skrev:
    >>>> On Sat, 27 Mar 2010 08:15:03 +0100, Ulf Samuelsson
    >>>> <> wrote:
    >>>>
    >>>>> <snip of LPC2xxx vx SAM7S discussion>
    >>>>> The 128 bit memory is overkill for thumb mode and just
    >>>>> wastes power.
    >>>>> <snip>
    >>>> Ulf, let me remind you of something you wrote about the SAM7:
    >>>>
    >>>> "In thumb mode, the 32 bit access gives you two
    >>>> instructions per cycle so in average this gives
    >>>> you 1 instruction per clock on the SAM7."
    >>>>
    >>>> I gather this is regarding the case where there is 1 wait
    >>>> state reading the 32-bit flash line -- so 2 clocks per line
    >>>> and thus the 1 clock per 16-bit instruction (assuming it
    >>>> executes in 1 clock.)
    >>>>
    >>>> Nico's comment about the NPX ARM, about the 128-bit wide
    >>>> flash line-width, would (I imagine) work about the same
    >>>> except that it reads at full clock rate speeds, no wait
    >>>> states. So I gather, if it works similarly, that there are
    >>>> eight thumb instructions per line (roughly.) I take it your
    >>>> point is that since each instruction (things being equal)
    >>>> cannot execute faster than 1 clock per, that it takes 8
    >>>> clocks to execute those thumb instructions.
    >>> Yes, the SAM7 is very nicely tuned to thumb mode.
    >>> The LPC2 provides much more bandwidth than is needed
    >>> when you run in thumb mode.

    >>
    >> I think I gathered that much and didn't disagree, just
    >> wondered.
    >>
    >>> Due to the higher latency for the LPC, due to slower
    >>> flash, the SAM7 will be better at certain frequencies,
    >>> but the LPC will have a higher max clock frequency.

    >>
    >> I remember you writing that "SAM7 uses a 33 MHz flash, while
    >> the LPC uses a 24 Mhz flash." It seems hard to imagine,
    >> though, except perhaps for data fetch situations or
    >> branching, it being actually slower. If it fetches something
    >> like 8 thumb instructions at a time, anyway. As another
    >> poster pointed out, the effective rate is much higher for
    >> sequential reads no matter how you look at it. So it would
    >> take branching or non-sequential data fetches to highlight
    >> the difference.
    >>
    >> One would have to do an exhaustive, stochastic analysis of
    >> application spaces to get a good bead on all this. But
    >> ignorant of the details as I truly am right now, not having a
    >> particular application in mind and just guessing where I'd
    >> put my money if betting one way or another, I'd put it on 384
    >> mb/sec memory over 132 mb/sec memory for net throughput.

    >
    >That is because you ignore the congestion caused by the fact
    >that the ARM7 core only fetches 16 bits per access in thumb mode.


    I'm not entirely sure I understand. If both processors are
    internally clocked at the same rate, they both have exactly
    the same fetch rate in thumb mode.

    >At 33 MHz, the CPU can only use 66 MB / second,
    >At 66 MHz, the CPU can only use 132 MB / second.


    Okay. I'm with you. Except that I haven't looked at the
    data sheets to check for maximum core clock rates, since that
    might bear on some questions.

    >Since you can sustain 132 MB / second with a 33 Mhz 32 bit memory,
    >you do not need it to be wider to keep the pipeline running
    >at zero waitstates for sequential fetch.


    In thumb mode and only talking about instructions and
    assuming 66MHz peak. Do the processors (either of them)
    sport separate buses, though, which can compete for the same
    memory system? (Data + Instruction paths, for example.)

    >For non-sequential fetch, the width is not important.


    In the case of instructions, I think I take your meaning.
    Regarding data, no, I don't.

    >Only the number of waitstates, and the SAM7 has same or less # of
    >waitstates than the LPC.


    .... In the case of non-sequential instruction fetch.

    All this still fails to account for actual application mix
    reports. I'm still curious (and I'm absolutely positive that
    this is _done_ by chip designers because I observed the sheer
    magnitude of the effort that took place at Intel during the
    P2 design period) about application analysis that must have
    been done on ARM (32-bit, 16-bit, and mixed modes) and should
    be available somewhere. Do you have access to such reports?
    It might go a long way in clarifying your points.

    >----
    >The 128 bit memory is really only useful for ARM mode.
    >For thumb mode it is more or less a waste.
    >
    >>> The real point is that you are not neccessarily
    >>> faster

    >>
    >> Yes, but the key here is the careful "not necessarily"
    >> wording. Not necessarily, is true enough, as one could form
    >> specific circumstances where you'd be right. But it seems to
    >> me they'd be more your 'corner cases' than 'run of the mill.'

    >
    >I dont think running in Thumb mode is a corner case.


    Actually, I meant this plural, not singular. And I don't
    have a perspective on actual applications in these spaces. So
    I'll just plead mostly ignorance here and hold off saying
    more, as I'm mostly trying to understand, not claim, things.

    >>> because you have a wide memory.
    >>> The speed of the memory counts as well.

    >>
    >> Of course. So people who seem to care about the final speed
    >> and little else should indeed do some analysis before
    >> deciding. But if they don't know their application well
    >> enough to make that comparison... hmm.
    >>
    >>> There are a lot of parameters to take into account
    >>> if you want to get find the best part.

    >>
    >> Yes. That seems to ever be true!
    >>
    >>> People with different requirements will find different
    >>> parts to be the best.

    >>
    >> Yes, no argument. I was merely curious about something else
    >> which you mostly didn't answer, so I suppose if I care enough
    >> I will have to go find out on my own.... see below.
    >>
    >>> If you start to use high speed communications, then the
    >>> PDC of the SAM7 serial ports tend to even out any
    >>> difference in performance vs the LPC very quickly.

    >>
    >> Some parts have such wonderfully sophisticated peripherals.
    >> Some of these are almost ancient (68332, for example.) So
    >> it's not only a feature of new parts, either. Which goes
    >> back to your point that there are a lot of parameters to take
    >> into account, I suppose.
    >>
    >>>> The discussion could move between discussing instruction
    >>>> streams to discussing constant data tables and the like, but
    >>>> staying on the subject of instructions for the following....
    >>> Yes, this will have an effect.
    >>> Accessing a random word should be faster on the SAM7
    >>> and, assuming you copy sequentially a large area
    >>> having 128 bit memory will be beneficial.

    >>
    >> The 'random' part being important here. In some cases, that
    >> may be important where the structures are 'const' and can be
    >> stored in flash and are accessed in a way that cannot take
    >> advantage of the 128-bit wide lines. A binary search on a
    >> calibration table with small table entry sizes, perhaps,
    >> might be a reasonable example that actually occurs often
    >> enough and may show off your point well. Other examples,
    >> such as larger element sizes (such as doubles or pairs of
    >> doubles) for that binary search or a FIR filter table used
    >> sequentially, might point the other way.
    >>
    >>>> So the effect is that it takes the same number of clocks to
    >>>> execute 1-clock thumb instructions on either system?
    >>>> (Ignoring frequency, for now.) Or do I get that wrong?
    >>> Yes, the LPC will in certain frequencies hjave longer latency
    >>> so it will be marginally slower in thumb mode.

    >>
    >> I find this tough to stomach, when talking about instruction
    >> streams Unless there are lots of branches salted in the mix.
    >> I know I must have read somewhere someone's analysis of many
    >> programs and the upshot of this, but I think it was for the
    >> x86 system and a product of Intel's research department some
    >> years ago and I've no idea how well that applies to the ARM
    >> core. I'm sure someone (perhaps you?) has access to such
    >> anaylses and might share it here?

    >
    >LPC with 1 waistates at 33 Mhz.
    >
    >NOP 2 (fetches 8 instructions)
    >NOP 1
    >NOP 1
    >NOP 1
    >NOP 1
    >NOP 1
    >NOP 1
    >NOP 1
    >.........
    >Sum = 9
    >
    >Same code with SAM7, 0 waitstate at 33 MHz.
    >
    >NOP 1 (fetches 1 instruction)
    >NOP 1 (fetches 1 instruction)
    >NOP 1 (fetches 1 instruction)
    >NOP 1 (fetches 1 instruction)
    >NOP 1 (fetches 1 instruction)
    >NOP 1 (fetches 1 instruction)
    >NOP 1 (fetches 1 instruction)
    >NOP 1 (fetches 1 instruction)
    >.........
    >Sum = 8
    >
    >It should not be to hard to grasp.


    What you wrote is obvious. But it is completely off the
    question I asked. Take a close look at my words. I am
    asking about the kind of analysis I observed taking place at
    Intel during the P2 development. It was quite a lot of work
    getting applications, compiler tools, and so on and
    generating actual code and then analyzing it before
    continuing the processor family design.

    Such a simple NOP case would have been laughed at, had it
    been presented as representative in such meetings. I'm
    looking for the thorough-going analysis that often takes
    place when smart folks attack a design.

    >>>> You then discussed power consumption issues. Wouldn't it be
    >>>> the case that since the NPX ARM is accessing its flash at a
    >>>> 1/8th clock rate and the SAM7 is constantly operating its
    >>>> flash that the _average_ power consumption might very well be
    >>>> better with the NPX ARM, despite somewhat higher current when
    >>>> it is being accessed? Isn't the fact that the access cycle
    >>>> takes place far less frequently observed as a lower average?
    >>> As far as I understand the chip select for the internal flash
    >>> is always active when you run at higher frequencies
    >>> so there is a lot of wasted power.

    >>
    >> By "at higher frequencies" do you have a particular number
    >> above which your comment applies and below which it does not?

    >
    >Each chip designer makes their own choices.
    >I know of some chips starting to strobe the flash
    >chip select when below 1 - 4 Mhz
    >>
    >> In any case, this is the answer I was looking for and you
    >> don't appear to answer now. Why would anyone "run the flash"
    >> when the bus isn't active? It seems.... well, bone-headed.
    >> And I can't recall any chip design being that poor. I've
    >> seen cases where an external board design (not done by chip
    >> designers, but more your hobbyist designer type) that did
    >> things like that. But it is hard for me to imagine a chip
    >> designer being that stupid. It's almost zero work to be
    >> smarter than that.

    >
    >This is an automatic thing which measures the clock frequency
    >vs another clock frequency, and the "other" clock frequency
    >is often not that quick.


    I guess I can't follow your words, here, at all. Maybe I
    didn't write well, myself. In any case, I will just leave
    this with my question still hanging there for me. Someone
    else may understand and perhaps answer.

    >> So this suggests you want me to go study the situation. Maybe
    >> someone already knows, though, and can post it. I can hope.
    >>
    >>>> Perhaps the peak divided by 8, or so? (Again, keep the clock
    >>>> rates identical [downgraded to SAM7 rates in the NXP ARM
    >>>> case.]) Have you computed figures for both?
    >>> Best is to check the datasheet.

    >>
    >> I wondered if you already knew the answer. I suppose not,
    >> now.

    >
    >Looking at the LPC2141 datasheet, which seems to be the part
    >closest to the SAM7S256 you get
    >57 mA @ 3.3V = 188 mW @ 60 Mhz = 3.135 mW/Mhz.
    >
    >The SAM7S datasheet runs 33 mA @ 3.3 V @ 55 MHz = 1.98 mW/Mhz,
    >You can, on the SAM7S choose to feed VDDCORE from 1.8V.
    >
    >The SAM7S is specified with USB enabled, so this
    >has to be used for the LPC as well for a fair comparision.


    Again, this misses my question entirely. But it may provide
    some answers to some questions not asked by me.

    >>> The CPU core used is another important parameter.
    >>> The SAM7S uses the ARM7TDMI while most other uses the ARM7TDMI-S
    >>> (S = synthesizable) which inherently has 33 % higher power consumption.

    >>
    >> I'm aware of the general issue. Your use of "most other"
    >> does NOT address itself to the subject at hand, though. It
    >> leaves open either possibility for the LPC2. But it's a
    >> point worth keeping in mind if you make these chips, I
    >> suppose. For the rest of us, it's just a matter of deciding
    >> which works better by examining the data sheet. We don't
    >> have the option to move a -S design to a crafted ASIC.
    >>
    >> So this leaves some more or less interesting questions.
    >>
    >> (1) Where is a quality report or two on the subject of
    >> instruction mix for ARM applications, broken down by
    >> application spaces that differ substantially from each other,
    >> and what are the results of these studies?


    A question which you went around completely in the above and
    which still remains...

    >> (2) Does the LPC2 device really operate the flash all the
    >> time? Or not?

    >
    >You do not have any figures in the datasheet indicating
    >low power mode.


    I don't think I was asking about low power modes. I think
    there must be a language problem, now. Let me try this
    again.

    When a memory system is cycled, there is power consumption
    due to state changes and load capacitance and voltage swings
    based upon the current from C*dV/dt and the supply voltages
    involved. When the memory system isn't clocked, when it
    remains 'static', leakage current can take place but the
    level is a lot less. This isn't about a low power mode. It's
    simply something fairly common to memory systems. I don't
    know enough about flash to know exact differences here, but I
    suspect that an unclocked flash memory consumes less power
    than one being clocked consistently. Let me use your
    simplistic example from above:

    LPC with 1 waistates at 33 Mhz.

    NOP 2 (fetches 8) 1 memory cycle
    NOP 1 0 memory cycles
    NOP 1 0 memory cycles
    NOP 1 0 memory cycles
    NOP 1 0 memory cycles
    NOP 1 0 memory cycles
    NOP 1 0 memory cycles
    NOP 1 0 memory cycles
    .......................................
    Sum 9 1 memory cycle

    Same code with SAM7, 0 waitstate at 33 MHz.

    NOP 1 (fetches 1) 1 memory cycle
    NOP 1 (fetches 1) 1 memory cycle
    NOP 1 (fetches 1) 1 memory cycle
    NOP 1 (fetches 1) 1 memory cycle
    NOP 1 (fetches 1) 1 memory cycle
    NOP 1 (fetches 1) 1 memory cycle
    NOP 1 (fetches 1) 1 memory cycle
    NOP 1 (fetches 1) 1 memory cycle
    .......................................
    Sum 8 8 memory cycles

    As you say, "It should not be to hard to grasp."

    I am imagining that 8 cycles against the flash will cost more
    power than 1. But I may not be getting this right.

    >> (3) Is the LPC2 a -S (which doesn't matter that much, but
    >> since the topic is brought up it might be nice to put that to
    >> bed?)

    >
    >Yes it is.
    >It should be enough to look in the datasheet.


    Thanks. That's a much clearer statement than before.

    Jon
     
    Jon Kirwan, Mar 28, 2010
    #19
  20. Jon Kirwan skrev:
    > On Sun, 28 Mar 2010 01:04:20 +0100, Ulf Samuelsson
    > <> wrote:
    >
    >> Jon Kirwan skrev:
    >>> On Sat, 27 Mar 2010 14:14:58 +0100, Ulf Samuelsson
    >>> <> wrote:
    >>>
    >>>> Jon Kirwan skrev:
    >>>>> On Sat, 27 Mar 2010 08:15:03 +0100, Ulf Samuelsson
    >>>>> <> wrote:
    >>>>>
    >>>>>> <snip of LPC2xxx vx SAM7S discussion>
    >>>>>> The 128 bit memory is overkill for thumb mode and just
    >>>>>> wastes power.
    >>>>>> <snip>
    >>>>> Ulf, let me remind you of something you wrote about the SAM7:
    >>>>>
    >>>>> "In thumb mode, the 32 bit access gives you two
    >>>>> instructions per cycle so in average this gives
    >>>>> you 1 instruction per clock on the SAM7."
    >>>>>
    >>>>> I gather this is regarding the case where there is 1 wait
    >>>>> state reading the 32-bit flash line -- so 2 clocks per line
    >>>>> and thus the 1 clock per 16-bit instruction (assuming it
    >>>>> executes in 1 clock.)
    >>>>>
    >>>>> Nico's comment about the NPX ARM, about the 128-bit wide
    >>>>> flash line-width, would (I imagine) work about the same
    >>>>> except that it reads at full clock rate speeds, no wait
    >>>>> states. So I gather, if it works similarly, that there are
    >>>>> eight thumb instructions per line (roughly.) I take it your
    >>>>> point is that since each instruction (things being equal)
    >>>>> cannot execute faster than 1 clock per, that it takes 8
    >>>>> clocks to execute those thumb instructions.
    >>>> Yes, the SAM7 is very nicely tuned to thumb mode.
    >>>> The LPC2 provides much more bandwidth than is needed
    >>>> when you run in thumb mode.
    >>> I think I gathered that much and didn't disagree, just
    >>> wondered.
    >>>
    >>>> Due to the higher latency for the LPC, due to slower
    >>>> flash, the SAM7 will be better at certain frequencies,
    >>>> but the LPC will have a higher max clock frequency.
    >>> I remember you writing that "SAM7 uses a 33 MHz flash, while
    >>> the LPC uses a 24 Mhz flash." It seems hard to imagine,
    >>> though, except perhaps for data fetch situations or
    >>> branching, it being actually slower. If it fetches something
    >>> like 8 thumb instructions at a time, anyway. As another
    >>> poster pointed out, the effective rate is much higher for
    >>> sequential reads no matter how you look at it. So it would
    >>> take branching or non-sequential data fetches to highlight
    >>> the difference.
    >>>
    >>> One would have to do an exhaustive, stochastic analysis of
    >>> application spaces to get a good bead on all this. But
    >>> ignorant of the details as I truly am right now, not having a
    >>> particular application in mind and just guessing where I'd
    >>> put my money if betting one way or another, I'd put it on 384
    >>> mb/sec memory over 132 mb/sec memory for net throughput.

    >> That is because you ignore the congestion caused by the fact
    >> that the ARM7 core only fetches 16 bits per access in thumb mode.

    >
    > I'm not entirely sure I understand. If both processors are
    > internally clocked at the same rate, they both have exactly
    > the same fetch rate in thumb mode.


    Yes, but the memory speed is important whenever you do a jump.

    In some frequency ranges the LPC has more waitstates than the SAM7, so
    the jump will be one clock cycle slower. The more jumps
    you have, the slower the relative performance of the LPC is.

    >
    >> At 33 MHz, the CPU can only use 66 MB / second,
    >> At 66 MHz, the CPU can only use 132 MB / second.

    >
    > Okay. I'm with you. Except that I haven't looked at the
    > data sheets to check for maximum core clock rates, since that
    > might bear on some questions.


    Yes, as I already mentioned, the LPC can run at a slightly faster clock,
    but this will not improve power consumption.


    >
    >> Since you can sustain 132 MB / second with a 33 Mhz 32 bit memory,
    >> you do not need it to be wider to keep the pipeline running
    >> at zero waitstates for sequential fetch.

    >
    > In thumb mode and only talking about instructions and
    > assuming 66MHz peak. Do the processors (either of them)
    > sport separate buses, though, which can compete for the same
    > memory system? (Data + Instruction paths, for example.)


    This is one of the weaknesses of the ARM7.
    It only has a single bus, so data and instruction is shared.
    The LPC adds a bridge from ASB to AHB which allows multiple transfers
    but this also causes synchronization delays for the CPU.
    Pro's and Con's of doing this.


    >> For non-sequential fetch, the width is not important.

    >
    > In the case of instructions, I think I take your meaning.
    > Regarding data, no, I don't.
    >


    I am not sure, but my guess is that one of the most common
    reasons for data access to the flash, is that the compiler
    loads 32 bit constants by pc relative reads, and then
    the number of waitstates is really critical.

    >> Only the number of waitstates, and the SAM7 has same or less # of
    >> waitstates than the LPC.

    >
    > ... In the case of non-sequential instruction fetch.


    and random data fetch

    >
    > All this still fails to account for actual application mix
    > reports. I'm still curious (and I'm absolutely positive that
    > this is _done_ by chip designers because I observed the sheer
    > magnitude of the effort that took place at Intel during the
    > P2 design period) about application analysis that must have
    > been done on ARM (32-bit, 16-bit, and mixed modes) and should
    > be available somewhere. Do you have access to such reports?
    > It might go a long way in clarifying your points.


    I know that this was done for the AVR32 but I dont have those reports.
    It is fairly obvious that NXP is focusing on people using ARM mode
    and Atmel is focusing on people running Thumb mode from the design
    decisions.


    >
    >> ----
    >> The 128 bit memory is really only useful for ARM mode.
    >> For thumb mode it is more or less a waste.
    >>
    >>>> The real point is that you are not neccessarily
    >>>> faster
    >>> Yes, but the key here is the careful "not necessarily"
    >>> wording. Not necessarily, is true enough, as one could form
    >>> specific circumstances where you'd be right. But it seems to
    >>> me they'd be more your 'corner cases' than 'run of the mill.'

    >> I dont think running in Thumb mode is a corner case.

    >
    > Actually, I meant this plural, not singular. And I don't
    > have a perspective on actual applications in these spaces. So
    > I'll just plead mostly ignorance here and hold off saying
    > more, as I'm mostly trying to understand, not claim, things.


    If you can meet your design goals in thumb mode,
    then it is almost always better to go for thumb mode,
    (due to code size)
    If you run in ARM mode, you can run at a lower frequency
    assuming zero waitstates.
    If you add waitstates, then thumb mode may actually be faster,
    so you may have to run at a higher frequency in ARM mode to
    compensate.





    >
    >>>> because you have a wide memory.
    >>>> The speed of the memory counts as well.
    >>> Of course. So people who seem to care about the final speed
    >>> and little else should indeed do some analysis before
    >>> deciding. But if they don't know their application well
    >>> enough to make that comparison... hmm.
    >>>
    >>>> There are a lot of parameters to take into account
    >>>> if you want to get find the best part.
    >>> Yes. That seems to ever be true!
    >>>
    >>>> People with different requirements will find different
    >>>> parts to be the best.
    >>> Yes, no argument. I was merely curious about something else
    >>> which you mostly didn't answer, so I suppose if I care enough
    >>> I will have to go find out on my own.... see below.
    >>>
    >>>> If you start to use high speed communications, then the
    >>>> PDC of the SAM7 serial ports tend to even out any
    >>>> difference in performance vs the LPC very quickly.
    >>> Some parts have such wonderfully sophisticated peripherals.
    >>> Some of these are almost ancient (68332, for example.) So
    >>> it's not only a feature of new parts, either. Which goes
    >>> back to your point that there are a lot of parameters to take
    >>> into account, I suppose.
    >>>
    >>>>> The discussion could move between discussing instruction
    >>>>> streams to discussing constant data tables and the like, but
    >>>>> staying on the subject of instructions for the following....
    >>>> Yes, this will have an effect.
    >>>> Accessing a random word should be faster on the SAM7
    >>>> and, assuming you copy sequentially a large area
    >>>> having 128 bit memory will be beneficial.
    >>> The 'random' part being important here. In some cases, that
    >>> may be important where the structures are 'const' and can be
    >>> stored in flash and are accessed in a way that cannot take
    >>> advantage of the 128-bit wide lines. A binary search on a
    >>> calibration table with small table entry sizes, perhaps,
    >>> might be a reasonable example that actually occurs often
    >>> enough and may show off your point well. Other examples,
    >>> such as larger element sizes (such as doubles or pairs of
    >>> doubles) for that binary search or a FIR filter table used
    >>> sequentially, might point the other way.
    >>>
    >>>>> So the effect is that it takes the same number of clocks to
    >>>>> execute 1-clock thumb instructions on either system?
    >>>>> (Ignoring frequency, for now.) Or do I get that wrong?
    >>>> Yes, the LPC will in certain frequencies hjave longer latency
    >>>> so it will be marginally slower in thumb mode.
    >>> I find this tough to stomach, when talking about instruction
    >>> streams Unless there are lots of branches salted in the mix.
    >>> I know I must have read somewhere someone's analysis of many
    >>> programs and the upshot of this, but I think it was for the
    >>> x86 system and a product of Intel's research department some
    >>> years ago and I've no idea how well that applies to the ARM
    >>> core. I'm sure someone (perhaps you?) has access to such
    >>> anaylses and might share it here?

    >> LPC with 1 waistates at 33 Mhz.
    >>
    >> NOP 2 (fetches 8 instructions)
    >> NOP 1
    >> NOP 1
    >> NOP 1
    >> NOP 1
    >> NOP 1
    >> NOP 1
    >> NOP 1
    >> .........
    >> Sum = 9
    >>
    >> Same code with SAM7, 0 waitstate at 33 MHz.
    >>
    >> NOP 1 (fetches 1 instruction)
    >> NOP 1 (fetches 1 instruction)
    >> NOP 1 (fetches 1 instruction)
    >> NOP 1 (fetches 1 instruction)
    >> NOP 1 (fetches 1 instruction)
    >> NOP 1 (fetches 1 instruction)
    >> NOP 1 (fetches 1 instruction)
    >> NOP 1 (fetches 1 instruction)
    >> .........
    >> Sum = 8
    >>
    >> It should not be to hard to grasp.

    >
    > What you wrote is obvious. But it is completely off the
    > question I asked. Take a close look at my words. I am
    > asking about the kind of analysis I observed taking place at
    > Intel during the P2 development. It was quite a lot of work
    > getting applications, compiler tools, and so on and
    > generating actual code and then analyzing it before
    > continuing the processor family design.


    Itr is just to show that the SAM7 does not need to have
    a 128 bit memory to achieve better performance than the LPC
    in Thumb mode. A faster flash memory is what is needed.

    >
    > Such a simple NOP case would have been laughed at, had it
    > been presented as representative in such meetings. I'm
    > looking for the thorough-going analysis that often takes
    > place when smart folks attack a design.
    >


    The real performance is application specific, so it may
    be as useless. My real purpose is showing that the
    128 bit memory does not neccessarily perform better
    than a faster 32 bit memory.

    For that, the example is good and simple enough.

    >>>>> You then discussed power consumption issues. Wouldn't it be
    >>>>> the case that since the NPX ARM is accessing its flash at a
    >>>>> 1/8th clock rate and the SAM7 is constantly operating its
    >>>>> flash that the _average_ power consumption might very well be
    >>>>> better with the NPX ARM, despite somewhat higher current when
    >>>>> it is being accessed? Isn't the fact that the access cycle
    >>>>> takes place far less frequently observed as a lower average?
    >>>> As far as I understand the chip select for the internal flash
    >>>> is always active when you run at higher frequencies
    >>>> so there is a lot of wasted power.
    >>> By "at higher frequencies" do you have a particular number
    >>> above which your comment applies and below which it does not?

    >> Each chip designer makes their own choices.
    >> I know of some chips starting to strobe the flash
    >> chip select when below 1 - 4 Mhz
    >>> In any case, this is the answer I was looking for and you
    >>> don't appear to answer now. Why would anyone "run the flash"
    >>> when the bus isn't active? It seems.... well, bone-headed.
    >>> And I can't recall any chip design being that poor. I've
    >>> seen cases where an external board design (not done by chip
    >>> designers, but more your hobbyist designer type) that did
    >>> things like that. But it is hard for me to imagine a chip
    >>> designer being that stupid. It's almost zero work to be
    >>> smarter than that.

    >> This is an automatic thing which measures the clock frequency
    >> vs another clock frequency, and the "other" clock frequency
    >> is often not that quick.

    >
    > I guess I can't follow your words, here, at all. Maybe I
    > didn't write well, myself. In any case, I will just leave
    > this with my question still hanging there for me. Someone
    > else may understand and perhaps answer.


    You have to figure out if you are running faster or slower
    than a certain frequency, and you have a limited amount
    of clocks available inside the chip.
    If you keep the flash on, then it can respond instantly.
    If it is off, then it may take some extra time to
    start it up, so you have to have clock edges on the alternate
    clock before the access, which can be used to start
    up the flash.


    >
    >>> So this suggests you want me to go study the situation. Maybe
    >>> someone already knows, though, and can post it. I can hope.
    >>>
    >>>>> Perhaps the peak divided by 8, or so? (Again, keep the clock
    >>>>> rates identical [downgraded to SAM7 rates in the NXP ARM
    >>>>> case.]) Have you computed figures for both?
    >>>> Best is to check the datasheet.
    >>> I wondered if you already knew the answer. I suppose not,
    >>> now.

    >> Looking at the LPC2141 datasheet, which seems to be the part
    >> closest to the SAM7S256 you get
    >> 57 mA @ 3.3V = 188 mW @ 60 Mhz = 3.135 mW/Mhz.
    >>
    >> The SAM7S datasheet runs 33 mA @ 3.3 V @ 55 MHz = 1.98 mW/Mhz,
    >> You can, on the SAM7S choose to feed VDDCORE from 1.8V.
    >>
    >> The SAM7S is specified with USB enabled, so this
    >> has to be used for the LPC as well for a fair comparision.

    >
    > Again, this misses my question entirely. But it may provide
    > some answers to some questions not asked by me.


    It shows the expected power consumption, with any tricks
    applied.


    >
    >>>> The CPU core used is another important parameter.
    >>>> The SAM7S uses the ARM7TDMI while most other uses the ARM7TDMI-S
    >>>> (S = synthesizable) which inherently has 33 % higher power consumption.
    >>> I'm aware of the general issue. Your use of "most other"
    >>> does NOT address itself to the subject at hand, though. It
    >>> leaves open either possibility for the LPC2. But it's a
    >>> point worth keeping in mind if you make these chips, I
    >>> suppose. For the rest of us, it's just a matter of deciding
    >>> which works better by examining the data sheet. We don't
    >>> have the option to move a -S design to a crafted ASIC.
    >>>
    >>> So this leaves some more or less interesting questions.
    >>>
    >>> (1) Where is a quality report or two on the subject of
    >>> instruction mix for ARM applications, broken down by
    >>> application spaces that differ substantially from each other,
    >>> and what are the results of these studies?

    >
    > A question which you went around completely in the above and
    > which still remains...
    >


    Dont have any real data, but I would expect that jumps
    are


    >>> (2) Does the LPC2 device really operate the flash all the
    >>> time? Or not?

    >> You do not have any figures in the datasheet indicating
    >> low power mode.

    >
    > I don't think I was asking about low power modes. I think
    > there must be a language problem, now. Let me try this
    > again.
    >
    > When a memory system is cycled, there is power consumption
    > due to state changes and load capacitance and voltage swings
    > based upon the current from C*dV/dt and the supply voltages
    > involved. When the memory system isn't clocked, when it
    > remains 'static', leakage current can take place but the
    > level is a lot less. This isn't about a low power mode. It's
    > simply something fairly common to memory systems. I don't
    > know enough about flash to know exact differences here, but I
    > suspect that an unclocked flash memory consumes less power
    > than one being clocked consistently. Let me use your
    > simplistic example from above:


    I think that when the flash is activated,
    you have quite a lot of static current in the sense amplifiers.
    The total sense amplifer current is proportional to
    the number of active sense amplifiers.

    >
    > LPC with 1 waistates at 33 Mhz.
    >
    > NOP 2 (fetches 8) 1 memory cycle
    > NOP 1 0 memory cycles
    > NOP 1 0 memory cycles
    > NOP 1 0 memory cycles
    > NOP 1 0 memory cycles
    > NOP 1 0 memory cycles
    > NOP 1 0 memory cycles
    > NOP 1 0 memory cycles
    > ......................................
    > Sum 9 1 memory cycle
    >
    > Same code with SAM7, 0 waitstate at 33 MHz.
    >
    > NOP 1 (fetches 1) 1 memory cycle
    > NOP 1 (fetches 1) 1 memory cycle
    > NOP 1 (fetches 1) 1 memory cycle
    > NOP 1 (fetches 1) 1 memory cycle
    > NOP 1 (fetches 1) 1 memory cycle
    > NOP 1 (fetches 1) 1 memory cycle
    > NOP 1 (fetches 1) 1 memory cycle
    > NOP 1 (fetches 1) 1 memory cycle
    > ......................................
    > Sum 8 8 memory cycles
    >
    > As you say, "It should not be to hard to grasp."
    >
    > I am imagining that 8 cycles against the flash will cost more
    > power than 1. But I may not be getting this right.


    Not if the sense amplifiers are turned on in both cases.
    There will be some switching current, but I have been
    told that the current in the sense amplifiers are much more significant.

    The 1st generation AVRs did not turn off the chip select.
    At 16 MHz, the flash will provide data out in < 67 ns.
    When you run at 33 Khz, the instruction cycle is 30 us.
    The flash still delivers data out in 67 ns, but the flash
    burns power almost like it the chip is running at 16 MHz.



    >
    >>> (3) Is the LPC2 a -S (which doesn't matter that much, but
    >>> since the topic is brought up it might be nice to put that to
    >>> bed?)

    >> Yes it is.
    >> It should be enough to look in the datasheet.

    >
    > Thanks. That's a much clearer statement than before.
    >
    > Jon
     
    Ulf Samuelsson, Mar 28, 2010
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Linh Le-Huy
    Replies:
    0
    Views:
    809
    Linh Le-Huy
    Jun 22, 2003
  2. Jan-Hinnerk Reichert
    Replies:
    0
    Views:
    785
    Jan-Hinnerk Reichert
    Aug 23, 2003
  3. Erwin Rensink
    Replies:
    5
    Views:
    487
    eddumweer
    Feb 19, 2004
  4. funkymunky

    parallel eeproms and 8051

    funkymunky, Dec 29, 2004, in forum: Embedded
    Replies:
    5
    Views:
    322
    Jim Granville
    Dec 29, 2004
  5. Leon
    Replies:
    7
    Views:
    234
    Ulf Samuelsson
    Mar 27, 2010
Loading...

Share This Page