1. This forum section is a read-only archive which contains old newsgroup posts. If you wish to post a query, please do so in one of our main forum sections (here). This way you will get a faster, better response from the members on Motherboard Point.

CPU selection

Discussion in 'Embedded' started by francesco, Mar 29, 2005.

  1. francesco

    francesco Guest

    For a sensor application, I need to select a CPU able to:
    - wake-up from power down;
    - sample a sound signal for at least 20 sec;
    - process the signal in time and frequency domain (at least, FIR and
    FFT);
    - send some result on a serial port;
    - fall back to sleep for at least 2 min.

    The application should be solar powered, so it need to be very very
    low power.

    I have considered some 8 bit processors, in particular Atmel
    ATMega128L.
    This processors offer very good power down current, but awful
    processing performance. This means that, for instante, an FFT require
    more time and, in consequence, more power.

    Other choice: a low end DSP. I have considered:
    - TI - TMS320C2407A;
    - Analog Devices - ADSP21990;
    - Microchip - DSPic 30F3012;
    First two DSPs offer a lot of processing power, but need about 200uA
    in power down. Last one is, in a certain sense, a middle way.

    Finally, I've considered a 32-bit processor: Intel PXA270 (Xscale).
    Good processing, but it need a lot of power to enter and exit power
    down. And it has a lot of useless peripheral which, in the end, use
    power. Finally, von Neumann architecture should require bus
    contemption between data and code, reducing power in respect of the
    above illustrated DSPs.

    I've made my mind to Texas Instruments, leaving a possibility for
    Microchip.

    Does anyone have any suggestion to give?
    Does anyone know some benchmarks to consult (for example, how many
    time is required for an FIR or an FFT on different platforms)?

    Thanks a lot.
    Best regard.
     
    francesco, Mar 29, 2005
    #1
    1. Advertisements

  2. francesco

    rickman Guest

    It is a difficult trade off between idle power used and speed of
    processing. But with 20 seconds of sample, I would assume your sleep
    time is much longer than that, so the sleep power is very important.
    In that case I would suggest a very low power processor such as the TI
    MSP430. They have very limited RAM, however. You don't say what your
    RAM requirements are.

    You can get around the high sleep power of any device by using an
    external device to turn power on and off. An RTC chip can do this.

    You might look at any of a number of ARM MCU chips. TI has the TMS470,
    Atmel has the AT91SAM7, Analog Devices has the a family and Philips has
    the LPC2000 line. These chips can all go much lower than 200 uA in low
    power modes and should process much faster than the 8 bit MCUs. Check
    out the ARM Yahoo groups.

    If you have other constraints, let us know.
     
    rickman, Mar 29, 2005
    #2
    1. Advertisements

  3. francesco

    ricore Guest

    Why not use a 2 stage system splitted in :

    - One ultra low consumption stage using a cheap 8 bit mcu (like 8052
    familly) to control sound aquisition to a memory storage and handle
    power and serial.
    - One higher consumption stage using any DSP you like
    that :
    - powers up during sound aquisition (the mcu trigger this event)
    - reads into the first stage memory and computes fft and fir fast,
    then stores the results in the memory
    - powers down ( sending a signal to the mcu that results are
    available )

    The mcu will then send results over the serial, and wait for next
    system triggering event.

    I ve heard that there are good solar power management chips, but can't
    remember any pointer to that info, a little search won't hurt on that
    subject since it could really be problematic.

    I'll love to read more about this project
    Best regards
     
    ricore, Mar 29, 2005
    #3
  4. For a sensor application, I need to select a CPU able to:
    I think it is critical to know sample rate and resolution.
    If we assume 48 kHz and 16 bit mono, you are looking at 2 MByte of Data!!!
    If you use an 8 kHz 8 bit PCM codec, then you have 160 kB of SRAM.

    The best alternative I can think of is the AT91R40008, which incorporates
    256 kB of SRAM.
    No nice peripherals, so you should put a small micro with ADC like an
    ATmega48 which samples
    the Audio and sends it over the USART to the AT91.
    The AT91 runs in idle mode, with the CPU shut down and the UART enabled.
    The built in DMA can receive up to 64 kB of Audio before the AVR wakes up
    the ARM
    so it needs to be active three times during a 160 kB transfer.

    If you sample less, and can fit all Audio samples into 64 kB, then the
    AT91SAM7S256 is a good choice.
    It can power down everything except a timer, which can periodically trigger
    an AD conversion.
    When the AD conversion is complete, the DMA controller will write the sample
    to internal SRAM.
    The CPU does not need to be powered up during the ADC process, so it will
    use very little power.

    The SAM7S should draw 26 uA at 32 Khz. You need to use an external LDO
    to get lowest power consumption.

    An FPSLIC (AVR + FPGA) can sample the ADC autonomously
    and can dor FIR processing in the FPGA portion at high speed.

     
    Ulf Samuelsson, Mar 29, 2005
    #4
  5. This sounds like a 2 device problem : One small uC, for Timing/power
    verify, and ideally ADC sampling/small buffering, and that wakes up
    the DSP core, only for the packet-crunching.

    Small uC candidates would be SiLabs C8051F (good analog) and
    TI's MSP430 ( low RTC operation, but only average analog performance )

    You will need power budgets for all parts, including Serial and
    Audio pre-amps - 20 secs : 2 mins is not a huge off-time, so run
    time powers may dominate.

    The DSp vendors will have some info, but you will need to decide
    the precision that matters to you.
     
    Jim Granville, Mar 29, 2005
    #5
  6. francesco

    RS Guest

    Dear Francesco.

    Don't look any further!
    MSP430!!!!!
    16 bit power, 12bit A/D resolution, lots of peripherals and very, very low
    power with a 16bit by 16 bit multiplier unit.

    Best regards.

    Regis.
     
    RS, Mar 30, 2005
    #6
  7. francesco

    dmm Guest

    MSP430F1611 has 48K flash, 10K ram.
    MSP430F1612 has 55K flash, 5K ram
     
    dmm, Mar 30, 2005
    #7
  8. francesco

    Andrew M Guest

    Because your 'processing' requirements are non-trivial, I would strongly
    recommend the TI TMS320VC5501/2 DSP. Not so good on sleep current but great
    on run current, per MIP. You'll do more in 1 MIP with VC5502 than in 30 MIPS
    with an 8 bitter.

    You could easily keep the VC5501 in standby or even power it down. If
    powered down, it will need to be booted each time which will cost time and
    current.

    I doubt you'll find anything with a lower power/MIPS ratio than these guys.
    Even with MSP430.

    -Andrew M
     
    Andrew M, Apr 1, 2005
    #8
  9. Well,

    for power it is the MSP 430 as mentioned several times, for processing
    power and wider range of memory I would recommend ARM like Ulf did.
    However I would recommend a lower cost device like the LPC2106 with
    128k Flash and 64k SRAM, running from Flash very close to full speed in
    ARM mode (between 90 and 99% of SRAM speed. With this device you would
    have 64k SRAM for buffer while running out of flash.

    More information here:
    http://www.semiconductors.philips.com/pip/LPC2106.html

    Great user group with lots of information here:
    http://groups.yahoo.com/group/lpc2000/

    Low cost boards from Olimex or with size limited compiler from IAR

    Sleep current typically between 10 and 20 uAs, not as good as 8-bit but
    MUCH better than the DSP options.

    An Schwob

     
    An Schwob in USA, Apr 1, 2005
    #9
  10. ....

    Which is not enough to store 20 seconds of uncompressed audio.
    That is why I proposed the AT91R40008 having 256 kb zero waitstate 32 bit
    SRAM.
    Would be curious to know if you could substantiate the claim of 90-99%.
    A non sequential fetch takes 3 clocks, and sequenctial fetch takes 1 clock.
    To reach 90% performance you have to execute in average 18 instructions
    between jumps.
    (solve the equation : 0,9 = n / (3 + (n - 1)))
    Even with conditional instructions, this is large...

    To reach 99% performance you have to execute in average 198 instructions
    between jumps.
    (solve the equation : 0,99 = n / (3 + (n - 1)))

    Does Philips have a branch target cache, otherwise there is no chance to
    meet the claim?
    --
    Best Regards,
    Ulf Samuelsson

    This message is intended to be my own personal view and it
    may or may not be shared by my employer Atmel Nordic AB
     
    Ulf Samuelsson, Apr 2, 2005
    #10
  11. francesco

    rickman Guest

    I have discussed this before with LPC2000 proponents and they are never
    able to support this claim. I guess they hear it and repeat it without
    considering the validity. The slowdown from branches is very
    significant since it adds multiple wait states. I think your equations
    do not fully calculate the negative impact a branch causes. Not only
    do you have to start a new flash read, but it is random where in the 4
    word read execution will start. This can result in more waits to fetch
    the next 4 words.

    I have yet to see anyone actually benchmark any code, flash vs. ram in
    the LPC2000 parts. Has Atmel provided any numbers for this on the
    SAM7?

    Seems to me like several approaches could be combined. If the OKI
    parts with 8 kB cache were fitted with 128 bit wide flash it would get
    as close as possible to the full rate of these CPUs.

    I will say that the Flash seems to be the weak link in the chain for
    all fast MCUs. Until this is properly solved, I guess we can expect
    ARM MCU performance to be limited to the 60 MHz ballpark.
     
    rickman, Apr 4, 2005
    #11
  12. francesco

    Stephen Pelc Guest

    I've benchmarked our Forth compilers and see about 95% of the
    no-wait performance. This depends somewhat on the code sequence
    and the loop alignment - in some small loops aligning the head
    of the loop to a 16 byte boundary is beneficial. All tests
    were performed on an LPC2106 set to 60MHz.

    Stephen


    --
    Stephen Pelc,
    MicroProcessor Engineering Ltd - More Real, Less Time
    133 Hill Lane, Southampton SO15 5AF, England
    tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691
    web: http://www.mpeltd.demon.co.uk - free VFX Forth downloads
     
    Stephen Pelc, May 4, 2005
    #12
  13. Stephen,

    this is good data and is totally in line with what we see. There has
    been no reasonable sized program (>2k) that performed slower than 90%
    of SRAM speed of what we tested. The typical speed was around 95%,
    looking at small DSP algorithms we could get close to 99%.

    The true benefit of the LPC2000 memory interface shows up when
    executing ARM mode. While Thumb mode is more compact and should be used
    for all non-realtime critical program parts, ARM mode is by definition
    of the architecture faster but gets slowed down by bus limitations in
    most embedded ARM Flash interfaces if the bus width is less than 64-bit
    and speed is faster than Flash access time, e.g. faster than 30 MHz.
    It takes approx 5 ARM instructions to perform the same functions as 7
    Thumb instructions. Assuming no bandwith limitation, this is a
    performance improvement up to 40% from ARM over Thumb. Thumb mode
    (7x16bit = 112bit) however saves 30% of code space over ARM (5x32bit =
    160bit).
    Getting the best performance / code density trade off with an ARM
    device, you want to have most of your code in Thumb mode, probably more
    than 90% but the smaller part of the code that is most real time
    critical you want to run in ARM mode. With the LPC2000 you can run it
    from Flash in ARM mode and will still get the average 95% while with
    other ARM7 implementations it is highly recommended to copy the fast
    routines into the SRAM and execute from there. This is some hassle with
    the software structure but well worth the effort getting up to 40% gain
    in execution speed.
    Summarizing you can get the more speed out of a 60 MHz LPC2000 running
    out of Flash than from a 55 MHz other device running out of SRAM. This
    can save a lot of SRAM and code complexity overhead.

    An Schwob
     
    An Schwob in USA, May 5, 2005
    #13
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.