1. This forum section is a read-only archive which contains old newsgroup posts. If you wish to post a query, please do so in one of our main forum sections (here). This way you will get a faster, better response from the members on Motherboard Point.

ARM's v7 MMU

Discussion in 'Embedded' started by Don Y, May 6, 2014.

  1. Don Y

    Don Y Guest

    Hi,

    Any pointers as to idiosyncrasies in ARM's v7 MMU? (no, I'm
    not looking for information as to how to *use* it; rather,
    pointers regarding any "unexpected behaviors" that I might
    encounter -- especially when mixing page sizes, etc.)

    Also, any pointers to particular silicon to avoid/favor
    in terms of potential problems in the MMU implementation?

    Thx!
    --don
     
    Don Y, May 6, 2014
    #1
    1. Advertisements

  2. There are 4 indirect things I've come across:

    1) As you will be aware, the whole caching/buffering subsystem was totally
    reworked for ARMv6 and the ARMv5 subsystem is no longer supported in ARMv7.
    I've found the configuration on memory/device regions is much more
    sensitive/fragile than it is with ARMv5 devices when the MMU is enabled.

    A specific example: if you are experiencing device lockups when enabling
    the MMU, try changing the device type attributes in the paging table for
    the peripheral region.

    2) If you are using the MMU on a device with Security Extensions enabled,
    don't forget that some register bits which are otherwise R/W become R/O
    in Non-Secure mode.

    3) Don't forget that on ARMv7 class devices, some register updates may
    be posted across a bus meaning they are not updated immediately. When
    you turn on instruction and data caching, interrupt handling code can
    run fast enough that you get a race condition with the interrupt hardware
    firing the interrupt for a second time unless you use the usual DSB
    instructions.

    I've seen this happen on the AM3359.

    4) There's no longer any way to invalidate the whole data cache in one
    go. You now have to do it by MVA or set/way.
    The AM3359 in the Beaglebone Black caused me way more trouble than the
    Allwinner A10s did. However, the AM3359 is heavily documented (unlike
    the Chinese jobs... :-()

    Simon.
     
    Simon Clubley, May 7, 2014
    #2
    1. Advertisements

  3. Hi Don,

    if on that version they still have that ridiculous MMU tagging
    pages by logical address - so you have to flush all caches etc. on
    task switch - may be your best chance is to simply disable it,
    if you have the option (I don't know ARM). Or switch to a power
    architecture processor, their MMUs work OK.

    Dimiter
     
    Dimiter_Popoff, May 7, 2014
    #3
  4. The problem is the caches on ARM don't work unless the MMU is enabled.
    Can you pick up capable battery operated small Power boards for
    about 20-30 British pounds ?

    You can for ARM but you can't for Power (at least the last time
    I checked) which is why there's a hobbyist and experimenting
    ecosystem around ARM but there isn't around Power.

    Simon.
     
    Simon Clubley, May 7, 2014
    #4
  5. I know, for whatever reason Power is kept out of reach for the
    hobbyist market. There are (very) powerful chips which allow sub-$100
    boards but that's about all.

    However, I don't think that would stop Don, my guess is he is
    just looking for the cheapest hardware which will do the job
    for him - which may well be ARM based but then may be not.

    Dimiter
     
    Dimiter_Popoff, May 7, 2014
    #5
  6. Don Y

    Don Y Guest

    Hi Simon,

    Yeah, I really would have liked the "tiny" page size (actually,
    even tiny *quarterpages*!) Aside from the (slight) performance
    gain, the sections/supersections I'd gladly trade away in that
    case!
    Is this just a case of "doing extra homework" (i.e., making sure you
    understand the repercussions of each flag setting)? Or, do certain
    targets behave differently (thus *requiring* different settings)?
    I assume you mean beyond the obvious "make sure the page is wired down",
    cacheability setting, etc.?

    Said another way (for all of the above), when you discover(ed) the
    source of the problem, did you slap your head and utter "D'oh!"
    (i.e., "damn, I should have known better!") *or* did you find
    yourself uncomfortably wondering why *that* fixed the problem?

    [The former I can deal with; the latter would leave me anxious!]
    I'm not sure I understand your point. Can you embelish an example?
    Hmmm.... this could be annoying. OTOH, there are few cases where
    I would need to invalidate more than just a cache line, "typically".
    So, the extra cost/complexity may disappear in practice.
    Was the "trouble" attributable to "learning curve"? I.e., did the
    A10 benefit from "previous experience" on the BB?

    Thanks!
    --don
     
    Don Y, May 7, 2014
    #6
  7. Don Y

    Don Y Guest

    Hi Simon & Dimiter,

    I'm not really interested in (ready-made) "boards" but the
    point is the same -- I want inexpensive and low power (that
    also tends to suggest a high level of integration).

    My power budget per node (including all "I/O loads") is ~10W.
    In some cases, much of that 10W is I/O so the processor needs
    to be in the 1-2W ballpark.
    There's (I think) also a bigger selection (vendors, configurations)
    with ARM.

    --don
     
    Don Y, May 7, 2014
    #7
  8. Don Y

    Don Y Guest

    Hi Dimiter,

    Yes, I'd like to keep cost and power requirements down.
    OTOH, I am (now) trying to cut some (development) corners.

    My original design would have required me to create *three*
    different RTOS's with compatible features/capabilities as
    they would execute on targets at different price/complexity
    points (e.g., "Intel", Cortex-A and Cortex-M). Hard to
    get such a heterogeneous system to "play nice together" :<

    OTOH, if I can 86 (ha!) the Intel targets, that gives me another
    degree of freedom in the design (and, *forces* me to steer clear
    of that ever-changing platform!).

    Now, I'm trying to rationalize replacing the Cortex-M devices
    with (more expensive) Cortex-A's... just to eliminate yet another
    variation and have a truly homogeneous system! Size may prove
    to be a problem...

    --don
     
    Don Y, May 7, 2014
    #8
  9. The latter.

    When I took the perfectly working settings from the A10s for the
    peripheral memory region to the BBB, the BBB locked up solid every time
    the MMU was enabled.

    Turns out that on the AM3359, the peripheral memory region must be marked
    as shareable device or it simply will not work. Marking the region as
    non-shareable device caused a solid lockup every time. This was not a
    issue on the A10s.
    Oh, yes. I went through all those (and more) before discovering the
    solution. I still cannot find anything which explains why the above
    is required on the AM3359 but not on the A10s.
    The latter. I could not find anything in the ARM architecture manuals,
    the AM3359 TRM or other documents about why two Cortex-A8 MCUs behave
    so differently. That makes me nervous.
    This is on the AM3359 with my own interrupt wrapper written in ARM
    assembly which is executed when the IRQ exception vector is triggered.

    The IRQ interrupt wrapper determines which interrupt handler to call
    (UART, timer, etc) and calls it.

    In the interrupt handler you write to a peripheral (say timer) register
    to say you have handled the interrupt and then return back to the IRQ
    interrupt wrapper.

    The IRQ interrupt wrapper writes to the AM3359 interrupt registers
    telling it the interrupt controller can search for a new interrupt.

    When both instruction and data caching is turned on, my code runs
    sufficiently fast that the write to the timer interrupt acknowledge
    register is still making it's way across the bus and the interrupt
    controller thinks the interrupt is still pending because there's no
    longer a coherent view of resources.

    The solution is to use a Data Synchronisation Barrier (DSB) instruction
    sometime between writing the timer interrupt acknowledge register and
    telling the the interrupt controller it can look for a new interrupt.

    If you read the AM3359 Technical Reference Manual, you will see the use
    of a DSB is discussed in relation to writing to the above mentioned
    interrupt controller register and the same reasoning can apply to the
    peripheral interrupt acknowledge registers as well.
    The A10s, with it's poor documentation, came first for me.

    I managed to figure things out on the A10s even with this poor
    documentation and I still got tripped up on the BBB when I later
    started playing with that.

    Older ARM MCUs used to have such nice predictable behaviour...

    Simon.
     
    Simon Clubley, May 7, 2014
    #9
  10. Don Y

    Don Y Guest

    Hi Simon,

    Sorry, my bad. :< By "targets" I meant "regions of memory" (i.e.,
    different I/O devices in the same system). It *appears* that the
    settings you eventually came up with work *universally* for all
    "(I/O) devices" within a given "MCU target" -- but, that the
    settings for MCU target #1 differ from those for MCU target #2.

    Is this a correct assessment?
    Do all of the "(I/O) devices" on that part fit in a single page/map?
    I.e., do you *replicate* the settings for the devices that reside
    at one part of the address space to devices that reside at other
    parts of the address space? (or, do you throw them all in a "section")
    <frown> And, not likely you are going to have N other MCUs to compare
    against (to determine *which* of these is the "exception"). :<

    No help from manufacturer? Forums?

    Will the A10 "behave" if configured as the AM3359? Or, does your
    code make assumptions that require it to be configured thusly?

    What are the design consequences of each configuration?
    Agreed. At the very least, have it documented as a "bug"/anomaly so
    you can at least know that "they" are aware of it -- and, will either
    act to preserve this behavior *or* alert folks to any *changes* to it.
    Ah... also makes sense.
    Logical choice (all else being equal) is to do so in the dispatcher
    (as it allows the most time for any previous code to "complete")
    Yes. I think the Cortex-A's are suffering from a desire to follow
    the "path" of other "big" (complex) processors (e.g., x86) along
    with all their cruft.

    One other question: is your use of the MMU largely "static"
    (i.e., set it and forget it); somewhat dynamic (using it to
    create individual protection domains for different processes);
    or even more "esoteric"? The intent of this question being to
    see how likely other "races" and anomalies are likely to have
    been stumbled upon in your codebase.

    Thanks!
    --don
     
    Don Y, May 8, 2014
    #10
  11. Yes, it is.

    The major surprise was finding two single CPU Cortex-A8 MCUs having
    different requirements. There was nothing in the AM3359 material I
    have read which indicated that only one of the ARMv7 architecture level
    options for mapping peripheral address space was available on the
    AM3359 or that this was even a potential issue.
    Yes. All the peripheral address space mappings have the same attributes.
    I have not tried that, but may in the future.

    Although I am a programmer/sys admin by day, that is on commercial
    systems, with typical commercial type programming and tasks.

    My embedded work is purely a hobby and right now I am deeply into
    other hobbyist interests. :)

    BTW, the help from the manufacturer is in the form of their StarterWare
    example code library. Unfortunately, while _every_ other manufacturer
    of ARM MCUs I have come across gladly places their example code on their
    website for free download, TI have placed their _example_ code under
    bl**dy export control!!! :-(

    I registered to download it and was denied access. TI support would not
    talk to me about granting access to the StarterWare kit unless I provided
    them with a range of personal information to establish my identity.
    (And BTW, this British guy living in the UK is _still_ annoyed about
    that.)

    I recently discovered the StarterWare kit has been uploaded to GitHub
    and I cannot see _anything_ in there which has any restricted, NDA or
    security issues at all. :-(
    Basically none I was aware of in single MCU systems. When adding MMU
    support to a existing A10s bare metal project, I basically read the
    ARMv7 architecture manual section in question before writing a single
    line of code, choose what looked like valid options for the MMU tables,
    and then wrote the code and mapping tables.

    After a few silly issues, the code pretty much worked the first time.
    Based on what I read in the architecture manuals and, later, the AM3359
    TRM, I had no reason to believe the same attributes would not work as-is
    on another Cortex-A8 MCU for the corresponding regions on that other MCU.
    That's exactly where I placed it. :)
    I think ARM are starting to lose the clean-and-elegant approach which
    has served them so well up to now.
    Largely static, with virtual to physical address mapping equivalence.

    I have quite a bit of experience with bare metal code on earlier ARM
    MCUs, but I wanted to explore the Cortex-A8 at bare metal level,
    basically just to learn about it (and for fun :)).

    Simon.
     
    Simon Clubley, May 8, 2014
    #11
  12. Don Y

    Don Y Guest

    Hi Simon,

    OK. Then, the next question (sorry, I don't necessarily expect you
    to have explored all of these options -- I'm just thinking aloud
    and wondering how you might opine regarding them) is: could the
    problem, perhaps, have been associated with a *particular* I/O
    device? Or, did you observe the problem plaguing *all* I/O's?
    Hmmm... I've been approaching this from the other end: starting with
    the generic ARM documents before selecting a particular device (and
    then chasing down the manufacturer's docs *for* that device).

    There are quite a few areas in the memory management description where
    they hand-wave and resort to "implementation defined" (i.e., Your
    Manufacturer May Vary) as a catchall. This was, in part, the reason
    prompting my initial query (i.e., how have folks been surprised by
    these "undocumented areas").

    On my next re-read of those sections, I will keep your comments handy
    and see if I can "rationalize" them in the context of ARM's caveats...
    Which returns to the question above (re: different I/O devices giving
    you problems vs. *all* of them)

    [I assume each of the devices are multi-core?]
    Understood. There is, also, always the incentive to "just get it to
    work" (and not really worry about the "why")
    My opinion of TI has steadily declined over the last ~30+ years. But,
    that is true, perhaps, of most of the "legacy" semi houses. They
    seem oblivious to the issues that have allowed all these "upstarts"
    to nibble away at a market within which they used to be Leviathan! :<
    (sigh) I expended a fair bit of effort to rid myself of their
    mailing and emailing lists. Very uncooperative. "Screw the
    customer" attitude.

    Thankfully, email addresses can be discarded easily and you
    can always rent a *different* POBox (notifying ONLY those mail
    and email contacts with which you want to remain in contact)

    OTOH, I can recall having a fair number of export issues in the
    past with projects and products that one would *think* would be
    unencumbered. :(
    <shrug> Perhaps they were just wanting to harvest "marketing"
    information under the guise of "security". Or, just some
    mindless dweeb who has been *told* to ask those questions

    ("Um, why do you need to know my name and address before you will
    *show* me some shoes? You know, this OTHER vendor didn't annoy
    me with these questions; perhaps I should just shop *there*?")
    Have you revisited the ARM docs to see if they shed extra light on
    your observations?
    I suspect that's partially related to the RISC-CISC issue -- as
    you start demanding more performance, RISC starts acquiring more
    complex mechanisms instead of strictly adhering to the "RISC mantra".

    :>

    --don
     
    Don Y, May 10, 2014
    #12
  13. I have not seen any indications that some devices in the peripheral
    address space need different MMU attributes from other devices
    within that same address space.

    I was doing something really simple - outputting characters on a UART in
    polling mode - when I got the lockup.
    I did exactly the same. I started with the virtual memory sections of
    the ARMv7 architecture manual, then the same sections in the Cortex-A8
    architecure manual and then finally the specific information in the
    AM3359 TRM.
    No. Both the A10s and AM3359 are single core devices.
    Several times. I am yet to locate the tangential reference in the
    second sentence of the third paragraph on page 1234 which explains
    the issue. :)

    Simon.
     
    Simon Clubley, May 12, 2014
    #13
  14. Don Y

    Don Y Guest

    Hi Simon,

    On 5/11/2014 5:20 PM, Simon Clubley wrote:

    [attrs elided]
    Fair enough.
    By "lockup", do you mean the UART stopped behaving properly (i.e.,
    not like your code *expected* it to behave)? Or, did the processor
    actually stop fetching opcodes... ?

    I.e., could the problem be explained/duplicated by the UART
    "disappearing" from the memory map?
    Ah, OK.
    Page 1234 in *which* document? I.e., the cited page in the Arch Ref Man
    (DDI 0406C.b) deals with floating point support... :-/

    I'll try to look at this more later this week. I have some commitments
    to attend to over the next few days...

    --don
     
    Don Y, May 13, 2014
    #14
  15. One test involved a blinking LED on a GPIO line at the same time.

    That LED stopped blinking as soon as the MMU was enabled, so this
    appears to have been a general lockup.
    I guess my British humour was too subtle. :)

    The page number was not intended to be a literal page number.

    I was making a comment/joke on the tendency of manufacturers to bury
    a critical insight into a device right in the middle of a huge document...

    Simon.
     
    Simon Clubley, May 13, 2014
    #15
  16. Don Y

    Don Y Guest

    Hi Simon,

    Having written thousands of pages of documentation, I can attest that
    making information easily available to the folks likely to go looking
    for it is very difficult! (and rarely "rewarded")

    Some folks need a document that you can slog through sequentially:
    do this, then do that. Other folks want a document organized as
    a "reference" of sorts -- that they can readily *remind* themselves
    of some particular fact. Still others are unconcerned with the
    "obvious" aspects of the object under discussion and, instead, want
    all the idiosyncrasies brought to the fore.

    Perhaps we need entries in the index labeled "implementation defined"?

    <grin>

    [There are sure a lot of them in just the memory management chapter
    of the arch ref man!]

    At the end of the day, your point is well taken: just because something
    *seems* like it SHOULD work, I should be prepared for it NOT to work
    and ready to tweek my expectations accordingly. :<

    Thanks!
    --don
     
    Don Y, May 15, 2014
    #16
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.