32-Bit MCU Decision

Discussion in 'Technical Support' started by Jerry Gardner, Mar 7, 2012.

  1. Jerry Gardner

    Jerry Gardner

    Mar 5, 2012
    Likes Received:
    I want to move to a modern 32-bit MCU for my next projects (primarily data acquisition and robotics) and would like some advice on which family to choose. I've narrowed my short list down to these three:

    • ARM Cortex-M3
    • PIC32
    • AVR32 UC3
    The Cortex-M3 seems very popular and has multiple vendors, but I'm looking at other things, too, like development environments and tools, peripherals, quality of documentation, and availability of evaluation boards.

    A concern with the PIC32 and AVR32 is that they're both single-vendor and prone to discontinuation.

    My needs are not too stringent in terms of clock speed--anything over 50 MHz will be okay, but I do need at least 32-64KB of RAM. I'll be porting some code and an RTOS that I originally wrote for the M68K.

    Any and all advice appreciated.
    Jerry Gardner, Mar 7, 2012
  2. Jerry Gardner

    Jerry Gardner

    Mar 5, 2012
    Likes Received:
    I got no replies to my original posting, but in the mean time I decided to go ahead and work with all three architectures. Here’s a summary of my experiences working with ARM Cortex-M3, PIC32, and AVR32 UC3.

    I started with the ARM Cortex-M3 in the guise of the NXP LPC1768. The LPC1768 is a Cortex-M3 MCU with 512KB of FLASH, 64KB of RAM, Ethernet, CAN, USB 2.0, 12-bit ADC, etc. The core is based on the ARM Cortex-M3, which uses the newer ARMv7-M architecture.

    From the perspective of an RTOS, the Cortex-M3 architecture was certainly the easiest of the three to port to. It has a nested, vectored interrupt controller (NVIC) that’s standard across all Cortex-M3 variants regardless of manufacturer. Ditto for the SysTick timer, which is typically used to generate basic timing for an OS. If I ever port my RTOS to another vendor’s MCU, I’ll no doubt have to rewrite most of the peripheral drivers, but the interrupt handling and basic timer tick code won’t need to change.

    The Cortex-M3 has a well thought-out interrupt handling and priority mechanism with all the support needed by the typical RTOS. One particularly useful feature is PendSV, which pends a software interrupt. I program the interrupt priority of PendSV to the lowest in the system and use it to invoke my scheduler. Since its priority is lower than any other interrupt source, I can invoke it anywhere, even in an interrupt handler, without worrying about nesting within the kernel. The scheduler is thus entered only after all other higher priority (i.e. all of them) handlers complete. This is fast and efficient when combined with the M3’s interrupt tail-chaining abilities. Measured context switch time on the LPC1768 running at 100MHz is 4.2 microseconds.

    The CPU automatically saves several registers when entering an interrupt and, together with the way the C compiler allocates registers, it’s possible to write interrupt handlers purely in C with no assembly wrapper or __attribute__ or pragma statements needed. The vector table is also simple: each table entry is the 32-bit address of the associated handler. The table itself can be relocated, if needed.

    The LPC1768 has a full complement of peripherals expected on a modern 32-bit MCU, including eight channels of 12-bit analog to digital conversion, a 10-bit DAC, Ethernet, USB 2.0, CAN, UARTS, SPI, I2C, timers, PWM, motor control PWM, watchdog timer, and real-time clock. All of these are modern implementations that do not require unnecessary jumping through hoops on the part of the programmer in order to implement drivers.
    I’ve not implemented drivers for all of these yet, so can only comment on the few I have worked with. The UARTs are pretty standard. They have the usual transmit and receive FIFOs and the receive side has a character timeout interrupt for those times when fewer characters arrive than the current FIFO threshold. Baud rate calculation is somewhat involved (the datasheet has a full page flowchart detailing the algorithm). I created a spreadsheet to calculate common rates and use a table-driven approach in my driver.

    The I2C channels support 100 kHz and 400 kHz data rates. The I2C Status register provides five bits of state information indicating the current bus state. The user’s guide has a full complement of flow charts and state tables indicating the various state transitions. I’ve only implemented master mode in my driver so far, and found it to be fairly straightforward and trouble free.

    SPI support is pretty standard too and I didn’t have any problems implementing a driver. The SPI peripheral supports Motorola SPI, 4-wire TI SSI, and National Semiconductor Microwire modes. Frame size support ranges from 4-16 bits. I wish more MCU makers would expand this to 4-32 bits as I’ve encountered a number of parts with odd-ball SPI interfaces that have 24 bit frames and such. These can be supported using 16 bit frames and manual control of the slave select line, which is not optimum, but it works.

    The real-time clock is a “real” RTC in that it has its own clock and power domains. Most RTCs that I’ve encountered on MCUs recently have separate clock domains (usually 32.768 kHz), but few have separate power domains. The LPC1768 has a VBAT pin that can be connected to a 3v lithium button cell to keep the RTC powered when the CPU itself is not powered. Most MCUs will only keep time when the CPU itself has power, which is about as useless as a screen door on a submarine.

    I haven’t done anything with the Ethernet, USB, CAN, or motor control PWM peripherals yet, so can’t comment on them here.

    My development environment consists of Rowley’s CrossWorks for ARM and a Segger J-LINK J-TAG unit. Both of these work very well. I’ve used many IDEs over the years and CrossLinks rates right up there at the top. It’s fast, clean and uncluttered, and doesn’t get in my way when debugging. The CrossWorks/J-LINK combination downloads code to the LPC1768 FLASH in a heartbeat, and single stepping performance is fantastic with no perceptible lag or delay. I rate both of these tools as A+.

    After working with the LCP1768, I turned my attention to the Microchip PIC32. When Microchip implemented their 32-bit MCU, they didn’t follow in the footsteps of most of their fellow vendors by choosing an ARM core. They instead went to MIPS and chose the M4K core. MIPS have been around since the mid-80’s and their cores are solid and well-understood.

    The M4K architecture is more RISC-like than the Cortex-M3. There are fewer, if any, instructions like the “stm” instruction, which stores multiple registers in memory in a single instruction. The interrupt mechanism is simpler as well, making the programmer do part of the work himself. For example, on an interrupt, an ARM Cortex-M3 will automatically stack r0-r3, r12, lr, psr, and the pc. The M4K core saves nothing on an interrupt (except for the return address). It’s up to code to save/restore any registers used by the interrupt handler. This means that interrupt handlers written in C require the compiler to generate prolog/epilog code. This is done via __attribute__((interrupt)) or pragma statements in C code.

    The PIC32 has two software interrupts whose function is similar to the Cortex-M3’s PendSV implementation. Interrupt priorities range from 1-7, which is more than enough for most systems, especially when take into account the four supported levels of subpriority. There’s a system level counter that can generate interrupts just like the Cortex-M3’s SysTick timer. The interface takes a little more work, however, but the basic functionality is the same.

    The PIC32 peripherals will be mostly familiar to anyone who’s used an 8-bit or, particularly, a 16-bit PIC. As expected, the PIC32 supports all of the common peripherals, including Ethernet , USB 2.0, CAN, UART, ADC, DAC, PWM,I2C, SPI, timers, RTC, WDT, etc. In general, these peripherals are somewhat lesser featured than those on the LPC1768. For example, while the UARTs have transmit and receive FIFOs, the receive side doesn’t have a character timeout interrupt. This mechanism must either be implemented by polling or through the use of a general purpose timer, which makes the driver more complex.

    Another omission lies in the real-time clock/calendar. While it does have a separate 32.768 kHz clock domain, it doesn’t have a separate power domain, so it only runs while the main CPU is powered. It’s things like this that make me wonder “what were they thinking?” when they designed this peripheral.

    My PIC32 development environment consisted of a chipKIT Max32 development board from Digilent, and a PICkit2 in-circuit debugger from Microchip. The Max32 is intended to be Arduino compatible, but I did not use it with that in mind. It also supports standard Microchip in-circuit debugging tools, such as the PICkit2 and the ICD3. The Max32 sports a PIC32MX795F512 processor running at 80 MHz, 512KB FLASH, and 128KB RAM. The board itself is hardware compatible with many existing Arduino shields.

    I used the recently released MPLAB-X 1.1 IDE from Microchip and the C32 compiler. The IDE is free and the compiler is free in a form that does not support compiler optimizations. The IDE itself is an improvement on Microchip’s somewhat ancient MPLAB tools, and is not bad for only recently being out of beta. It still has a few rough edges, however, which I’m sure will go away as newer revisions come out.

    One big disappointment is the speed of the picKIT3. It’s slow—really slow. Downloading my RTOS code (about 25kB code and data) to the PIC32 takes around 25 seconds, and single-stepping C code is completely unusable, as it takes roughly 12-15 seconds to step a single line of code. I haven’t been able to determine if this is normal behavior for the picKIT3 (it’s Microchip’s $39 bottom-of-the-line debugger, and it only supports 12 Mbit/sec full-speed USB). The next step up is the ICD3, which runs $200. I may decide to splurge on an ICD3 if it has reasonable performance.
    Jerry Gardner, Mar 27, 2012
  3. Jerry Gardner

    Jerry Gardner

    Mar 5, 2012
    Likes Received:
    Part 2:

    The last MCU in my recent adventures, the Atmel AVR32, is somewhat of an oddball. Rather than going to an external source for a core like NXP (ARM) or Microchip (MIPS) did, Atmel chose to design their own core. There are two main variants of the AVR32: the AP7, which is designed to compete with high-end ARM chips like the ones used in the iPhone, and the UC3, which is intended for the embedded MCU market. I focus on the UC3 here.

    The AVR32 UC3 is more like the Cortex-M3 than the PIC32. It has many time-saving features, such as instructions that can read/write multiple registers from/to memory and DSP instructions. The interrupt mechanism, in my opinion, is needlessly complex. Unlike that in the Cortex-M3 that merely vectors interrupts through dedicated vectors, the AVR32 typically requires some C code to determine which group and which request is active. This is how the example code provided by Atmel works. This adds complexity and latency where none is typically desired.

    I took a simpler approach for my RTOS, which does not need to register interrupt handlers at run time. I created a two-level jump table (that resides in FLASH) that requires only two jump instructions to get to an interrupt handler. For those interrupt groups that include more than one peripheral, such as Group 1, I leave it up to the handler itself to determine which peripheral is interrupting.

    Although the CPU saves basic state information on interrupt like the Cortex-M3, ISRs written in C still require an __attribute__((interrupt)) modifier. The Cortex-M3 sets the link register to a magic number, which allows it to use the standard function return mechanism to return from an interrupt, whereas the AVR32 uses a separate “rete” (return from event handler) instruction to return from an interrupt, hence the need for special code in an interrupt handler.

    Like the Cortex-M3 and the PIC32, the AVR32 has a core timer that can be used to generate periodic interrupts such as those needed for the tick timer of an RTOS. Strangely, the AVR32 does not provide a software interrupt mechanism. It has an “scall” instruction, but this is a synchronous exception, which is handled immediately, not an interrupt that is held off until the processor priority allows it (like the PendSV mechanism on the Cortex-M3 and software interrupts on the PIC32). Software interrupts can be simulated, however, using the GPIO interrupt facility. An unused GPIO line can be set up to generate an interrupt and any state change generated by writing to that line’s output value register will generate the interrupt. Crude, but it works.

    The AVR32, like its little brother the AVR8, has all of the usual peripherals available, and then some, including Ethernet, USB 2.0, ADC, CAN, SPI, I2C, UART, etc. Most of these peripherals are fairly modern and correspond roughly with their LPC1768 and PIC32 equivalents.

    The ADC is only 10 bits, like the PIC32, but unlike the LPC1768, which has a 12-bit ADC. The UARTs don’t have FIFOs, which is a strange omission. The real-time counter (RTC) is a primitive real-time clock. It has its own clock domain, but not power domain, and doesn’t keep track of time, just counts ticks of the RC oscillator or 32.768 kHz clock. I haven’t had time to write drivers for any of the other peripherals, so can’t comment on them here.

    The AVR32 documentation is rather poor, in my opinion. Most needed information is there, just scattered across many documents and hard to navigate. One glaring omission is a description of the C calling convention. This is nowhere to be found in the published documentation. I had to reverse-engineer it by looking at disassembled C code. I’ve been told that Atmel has this information, but just haven’t published it. Why? Is it considered a trade secret?

    My AVR32 development environment is an Atmel EVK1100 development board and the Atmel AVR ONE! debugger. The EVK1100 has a AT32UC3A0512 MCU with 512kB FLASH, 64kB RAM, and runs up to 66 MHz. The EVK1100 is a nice board overall, with some nice features, including a backlit 4x20 LCD display, a temperature sensor, a light sensor, LEDs, buttons, joystick, and a prototyping area. There are some design choices on the board that again leave me scratching my head and asking myself “what were they thinking?” The board breaks out all of the MCU signals to a row of holes intended to be used with headers, but for some unfathomable reason they require 0.05” (1.27mm) headers instead of standard 0.1” headers. I don’t know about you, but I have hundreds of 0.1” headers in my parts collection, but not a single 0.05” header is to be found. I’ll have to order some 0.05” twin row headers, and I’m not looking forward to soldering them given the small pin spacing and dual rows.

    Another poor design choice is the use of a 38-pin Mictor connector for the AVR ONE! debugger connector. The $600 AVR ONE! supports trace, which requires the Mictor connector, otherwise it can use the standard J-TAG connector (dual row 0.1” header). The Mictor connector is uncommon, expensive, delicate, and difficult to solder (it’s surface mount, not through hole). I wish Atmel used a J-TAG-like connector instead.

    The software development environment is somewhat schizophrenic. The original environment was AVR32 Studio, which was based on Eclipse and GCC, and is AVR32-specific. The new environment is AVR Studio 5.1, which is based on Microsoft Visual Studio, and supports both 8-bit and 32-bit AVRs. And finally there’s Atmel Studio 6, which adds ARM support to AVR Studio 5.1 (and hence the name change from AVR to Atmel). Studio 6 is still in beta.

    If you use Linux or Mac OS X, you’re out of luck with Studio 5.1 or 6. It may work in a Windows virtual machine, but I haven’t tried that. Although I’ve used (and like) Visual Studio a lot, the Atmel incarnation omits some features, which is puzzling.

    The Atmel Studio 6/AVR ONE! combination is fast at downloading code to the board and single-stepping C and assembly code, but not quite as fast as the CrossWorks/J-LINK on the ARM. For some reason, AVR32 Studio supported the AVR ONE! trace feature, but AVR Studio 5.1 and Atmel 6 do not. I’m sure there are more than a few people out there who dropped $600 on an AVR ONE! who are fuming about this omission.


    CPU Architecture
    ARM Cortex-M3: A
    PIC32: C+
    AVR32: B

    LPC1768: A-
    PIC32: B
    AVR32: B-

    LPC1768: A-
    PIC32: A
    AVR32: C-

    In-Circuit Debugger
    J-LINK: A+
    picKIT3: F
    AVR ONE!: B+

    CrossWorks: A+
    MPLAB-X: B+
    AVR Studio: B-
    Jerry Gardner, Mar 27, 2012
Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.