1. This forum section is a read-only archive which contains old newsgroup posts. If you wish to post a query, please do so in one of our main forum sections (here). This way you will get a faster, better response from the members on Motherboard Point.

ElectricFence Exiting: mprotect() failed: Cannot allocate memory

Discussion in 'Embedded' started by Bill, Oct 14, 2008.

  1. Bill

    Bill Guest

    I am using electric fence 2.1.13 to try to find a memory allocation
    problem that occurs after my application runs for about 3 hours. When
    I link to the electric fence library, I get "ElectricFence Exiting:
    mprotect() failed: Cannot allocate memory" during initialization.
    Could this be the source of the error that takes 3 hours to occur? I
    wonder because all I see at this point is a 12 byte malloc.

    According to a comment in efence.c, "On some systems it will be
    necessary to increase the amount of swap space in order to debug large
    programs that perform lots of allocation, because of the per-buffer
    overhead." How does one increase the amount of swap space? I am
    running Linux 2.6.26 on an MPC8248.
     
    Bill, Oct 14, 2008
    #1
    1. Advertisements

  2. I doubt that's the source of the error that takes 3 hours to occur.
    I would recommend doing invasive debugging on a test system with
    significant additional memory. It's hard to help you without knowing
    more about your hardware. Do you have a hard drive? Do you have USB
    ports? How much memory do you have?

    DS
     
    David Schwartz, Oct 14, 2008
    #2
    1. Advertisements

  3. Bill

    Bill Guest

    After about 3 hours, the program seg faults when trying to do a malloc
    65K bytes. At the time, according to top, there is plenty of memory
    available

    I tried using valgrind but it slowed down my application so much that
    it was useless..
     
    Bill, Oct 14, 2008
    #3
  4. Bill

    Bill Guest

    I have a total of 128 MB of flash on my target board. No USB ports.
    Monitoring top, it does not appear that memory is being leaked, but it
    is behaving as if running out of memory. Is there a better way than
    top to monitor memory?
     
    Bill, Oct 14, 2008
    #4
  5. Apparently you do not have even 64 KiB of _contiguous_ virtual memory
    available, but only a huge number of smaller fragments all over the
    memory. I guess that the system would run a few hours longer, if the
    largest allocation was 8 KiB :).

    Sounds like a typical dynamic memory fragmentation problem.

    The other alternative, if the stack and dynamic memory occupy the same
    memory area (one growing upwards and the other downwards) is that he
    stack size is constantly increasing due to a programming error,
    finally inhibiting the growth of the heap.

    Paul
     
    Paul Keinanen, Oct 14, 2008
    #5
  6. Still assuming your description is correct, it is behaving as if the
    malloc-code made an invalid memory access because of a corrupted
    pointer inside the heap. But you can easily verify if the allocation
    should have succeeded, ie if there was a continuous area of at least
    64K of 'unused VM' available:

    1. Modify the segfault handler in the kernel to send a SIGSTOP
    instead of a SIGSEGV.

    2. Use pmap to inspect the address space layout of the affected
    process after it has been stopped by the signal.
     
    Rainer Weikusat, Oct 14, 2008
    #6
  7. Bill

    John Reiser Guest

    ... a memory allocation
    Under glibc, setting the shell environment variable "export MALLOC_CHECK_=2"
    [note the trailing underscore] performs additional internal consistency checks
    that are relatively inexpensive. Run "info libc" then search for MALLOC_CHECK_.

    man swapon # how to increase swap space.
    /proc/<pid>/maps reveals summary information for one process.
    /proc/<pid>/smaps reveals more details for one process.
    /proc/meminfo reports a system-wide summary.

    --
     
    John Reiser, Oct 14, 2008
    #7
  8. The OP is using a PPC-based SoC. I doubt that he has any swap space on
    board.
     
    Rainer Weikusat, Oct 14, 2008
    #8
  9. Bill

    Bill Guest

    Below is what pmap -x gives for the process (snmpd) upon failing at a
    call to malloc for 65536 bytes. Does anything here would indicate a
    possible problem trying to malloc 65536 bytes? It should be noted
    that a call to pmap -x before the failure while snmpd was still
    running gave identical results. Therefore, I wonder if the cause of
    the problem can be seen here?


    Address Kbytes RSS Anon Locked Mode Mapping
    0f8b8000 64 - - - r-x-- libresolv-2.6.so
    0f8c8000 252 - - - ----- libresolv-2.6.so
    0f907000 4 - - - r---- libresolv-2.6.so
    0f908000 4 - - - rwx-- libresolv-2.6.so
    0f909000 8 - - - rwx-- [ anon ]
    0f91b000 16 - - - r-x-- libnss_dns-2.6.so
    0f91f000 252 - - - ----- libnss_dns-2.6.so
    0f95e000 4 - - - r---- libnss_dns-2.6.so
    0f95f000 4 - - - rwx-- libnss_dns-2.6.so
    0f970000 40 - - - r-x-- libnss_files-2.6.so
    0f97a000 252 - - - ----- libnss_files-2.6.so
    0f9b9000 4 - - - r---- libnss_files-2.6.so
    0f9ba000 4 - - - rwx-- libnss_files-2.6.so
    0f9cb000 28 - - - r-x-- librt-2.6.so
    0f9d2000 252 - - - ----- librt-2.6.so
    0fa11000 4 - - - r---- librt-2.6.so
    0fa12000 4 - - - rwx-- librt-2.6.so
    0fa23000 1264 - - - r-x-- libc-2.6.so
    0fb5f000 252 - - - ----- libc-2.6.so
    0fb9e000 8 - - - r---- libc-2.6.so
    0fba0000 12 - - - rwx-- libc-2.6.so
    0fba3000 12 - - - rwx-- [ anon ]
    0fbb6000 12 - - - r-x-- libEclipseHms.so
    0fbb9000 252 - - - ----- libEclipseHms.so
    0fbf8000 4 - - - rwx-- libEclipseHms.so
    0fc09000 16 - - - r-x-- libEclipseVer.so
    0fc0d000 252 - - - ----- libEclipseVer.so
    0fc4c000 4 - - - rwx-- libEclipseVer.so
    0fc5d000 12 - - - r-x-- libEclipsePai.so
    0fc60000 252 - - - ----- libEclipsePai.so
    0fc9f000 4 - - - rwx-- libEclipsePai.so
    0fcb0000 8 - - - r-x-- libEclipseCil.so
    0fcb2000 256 - - - ----- libEclipseCil.so
    0fcf2000 4 - - - rwx-- libEclipseCil.so
    0fd03000 12 - - - r-x-- libEclipseConf.so
    0fd06000 256 - - - ----- libEclipseConf.so
    0fd46000 4 - - - rwx-- libEclipseConf.so
    0fd57000 16 - - - r-x-- libEclipseSem.so
    0fd5b000 252 - - - ----- libEclipseSem.so
    0fd9a000 4 - - - rwx-- libEclipseSem.so
    0fdab000 12 - - - r-x-- libEclipseLog.so
    0fdae000 256 - - - ----- libEclipseLog.so
    0fdee000 4 - - - rwx-- libEclipseLog.so
    0fdff000 8 - - - r-x-- libEclipseLst.so
    0fe01000 252 - - - ----- libEclipseLst.so
    0fe40000 4 - - - rwx-- libEclipseLst.so
    0fe51000 80 - - - r-x-- libpthread-2.6.so
    0fe65000 256 - - - ----- libpthread-2.6.so
    0fea5000 4 - - - r---- libpthread-2.6.so
    0fea6000 4 - - - rwx-- libpthread-2.6.so
    0fea7000 8 - - - rwx-- [ anon ]
    0feb9000 640 - - - r-x-- libm-2.6.so
    0ff59000 252 - - - ----- libm-2.6.so
    0ff98000 4 - - - r---- libm-2.6.so
    0ff99000 12 - - - rwx-- libm-2.6.so
    0ffac000 12 - - - r-x-- libdl-2.6.so
    0ffaf000 252 - - - ----- libdl-2.6.so
    0ffee000 4 - - - r---- libdl-2.6.so
    0ffef000 4 - - - rwx-- libdl-2.6.so
    10000000 1192 - - - r-x-- snmpd
    10169000 32 - - - rwx-- snmpd
    10171000 552 - - - rwx-- [ anon ]
    30000000 116 - - - r-x-- ld-2.6.so
    3001d000 24 - - - rw--- [ anon ]
    30023000 4 - - - r--s- [ shmid=0x0 ]
    30024000 4 - - - rw--- [ anon ]
    30025000 4 - - - r--s- [ shmid=0x0 ]
    3005c000 4 - - - r---- ld-2.6.so
    3005d000 4 - - - rwx-- ld-2.6.so
    3005e000 4 - - - ----- [ anon ]
    3005f000 8188 - - - rw--- [ anon ]
    3085e000 4 - - - ----- [ anon ]
    3085f000 8188 - - - rw--- [ anon ]
    7ff61000 332 - - - rw--- [ stack ]
    -------- ------- ------- ------- -------
    total kB 25084 - - -
     
    Bill, Oct 14, 2008
    #9
  10. Bill

    CBFalconer Guest

    Please do not top-post, but do snip properly. Your answer belongs
    after (or intermixed with) the quoted material to which you reply,
    after snipping all irrelevant material. This gives prospective
    repliers a fighting chance at understanding the thread. See the
    following links:

    <http://www.catb.org/~esr/faqs/smart-questions.html>
    <http://www.caliburn.nl/topposting.html>
    <http://www.netmeister.org/news/learn2quote.html>
    <http://cfaj.freeshell.org/google/> (taming google)
    <http://members.fortunecity.com/nnqweb/> (newusers)
     
    CBFalconer, Oct 14, 2008
    #10
  11. Bill

    Bill Guest

    A backtrace in the SIGSEGV signal handler I put into the application
    points to the line where the malloc occurs. There is an if statement
    to check for a NULL pointer and print a message if malloc returned a
    NULL pointer. No message is printed.
    1. Valgrind slows down the application too much to be effective.
    2. efence exits during initialization with the "Exiting: mprotect()
    failed: Cannot allocate memory" error.
    3. I am running a test right now with MALLOC_CHECK_=2 and will examine
    the results in the morning.
     
    Bill, Oct 15, 2008
    #11
  12. Bill

    John Reiser Guest

    A backtrace in the SIGSEGV signal handler I put into the application
    Beware of the possibility of buffering. Consider the program below.
    When run interactively with stdout connected to a terminal, then:
    f(10) is NULL.
    Segmentation fault
    where the first line is unbuffered stdout from the program,
    and the second line is unbuffered stderr from the shell.
    When run with stdout re-directed into a regular file, then you see only:
    Segmentation fault
    on stderr, and the file is *empty* ["No message is printed."]
    because the buffer was not flushed. So remember fflush().

    -----
    #include <stdio.h>

    char *f(a)
    {
    return 0;
    }

    main()
    {
    char *p = f(10);
    if (NULL==p) {
    printf("f(10) is NULL.\n");
    /* fflush(stdout); THE FIX */
    }
    return *p;
    }
    -----
     
    John Reiser, Oct 15, 2008
    #12
  13. The last line should describe the 'regular heap' of the application
    (the area used by brk/sbrk). Its present size is 552K and it could
    grow by about another 510M until it would 'hit' ld-2.6.so (sbrk/brk
    would return null pointers then).

    The two 8818K segements preceded by a single page w/ 'no access' are
    most likely (userspace) NPTL-stacks for two threads (default NPTL
    thread stack size is 8M, the lowest 4K are used as guard page so that
    an access beyond the bounds of one stack causes a [MMU] exception
    instead of overwriting data on the other stack). These stacks are
    allocated by calling mmap with MAP_ANON. There is still plenty of
    space for other anonymous mappings between the highest used address
    (0x3105f000) and the lowest presently used address of the
    conventional 'stack segment'.

    Unless I am very much mistaken, this process should certainly be
    capable of allocating more virtual memory using either brk/sbrk or
    mmap.

    BTW, while getting non-spam e-mails at least ocassionally is nice :),
    I usually read postings in the groups I frequent, except insofar
    'certain posters', whom I deem to be more of an annoyance than an
    information source, will be filtered by my newsreader.
     
    Rainer Weikusat, Oct 15, 2008
    #13
  14. That may be your experience but personally I find it incredibly
    useful for certain classes of problems. Maybe not high-level stuff
    or full applications but for low level data structure test beds I
    find you can literally do in a morning what may take a week overwise.
     
    Andrew Smallshaw, Oct 15, 2008
    #14
  15. Bill

    Bill Guest


    When it crashes, I get a SIGSEGV signal with an si_code of SEGV_MAPERR
    and si_addr of 0x2d. What does address 0x2d represent? Is the
    problem that address 0x2d is not in the ranges shown in pmap?
     
    Bill, Oct 15, 2008
    #15
  16. Page fault when accessing an unmapped page.
    The address that the program tried to access.
    Well, sort of. 0x2d isn't in that range because that page isn't mapped.
    But it's not supposed to be mapped. The first page of virtual memory is
    always unmapped, so that NULL pointer dereferences generate faults. So
    it's an address that can't possibly be valid. If the crash is inside
    malloc, as you said earlier, then most likely some pointer in malloc's
    data structures got overwritten with 0x0000002d.

    If you have a core dump, you might be able to trace backwards a little
    ways to figure out where this pointer itself is located. If you
    recognize the data around it, it might suggest to you what part of your
    program could be guilty of overwriting it. (As a start, 0x2d is ASCII
    '-'. Any part of your program use hyphens?)
     
    Nate Eldredge, Oct 15, 2008
    #16
  17. Bill

    CBFalconer Guest

    Take a look at the description of the debug facilities in
    nmalloc.txh. That is part of nmalloc.zip, and is the source for
    the info documentation of nmalloc. nmalloc, in turn, is almost
    pure standard C, but relies on the system sbrk() to get mamory
    space, and makes some (quite usual) assumptions about memory. See:

    <http://cbfalconer.home.att.net/download/nmalloc.zip>
     
    CBFalconer, Oct 16, 2008
    #17
  18. This is still not very good. What if 'printf' needs to allocate memory
    to do its job? What if 'fflush' does? In an error handler like this,
    you are better off calling 'write' directly.

    DS
     
    David Schwartz, Oct 16, 2008
    #18
  19. This means roughly 'it is more likely that the system ran out of
    memory than that the application contained a programming error'.
    But this is again a question which can be answered very simply: Check
    the PC/IP value at the time of the segfault. That's either within
    malloc (as the OP has repeatedly claimed) or within application code.

    So, what is it?
     
    Rainer Weikusat, Oct 16, 2008
    #19
  20. Another option is to force a segfault, ie assign to the area when the
    pointer is null. The values of the various CPU registers, especially
    the program counter, can then (in combination with a disasembler) be
    used to determine the location of the crash and hence, the condition
    at the time of the test.
     
    Rainer Weikusat, Oct 16, 2008
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.