1. This forum section is a read-only archive which contains old newsgroup posts. If you wish to post a query, please do so in one of our main forum sections (here). This way you will get a faster, better response from the members on Motherboard Point.

"Efficient" timer implementations

Discussion in 'Embedded' started by D Yuniskis, Dec 6, 2010.

  1. D Yuniskis

    D Yuniskis Guest


    In my latest RTOS, I have timers on every service (i.e.,
    every service invocation can return ERROR_TIMEOUT in
    addition to SUCCESS or FAILURE). But, the common
    approaches to providing timers (not just timeOUTs)
    don't "feel right" when applied consistently like this.
    I'm wondering if there is something I can exploit to
    "cut a corner" in my implementation...

    E.g., allocating a "pool" of timers requires careful
    sizing of that pool -- lest a service end up having
    to support a NO_TIMER_AVAILABLE error code (frown).
    OTOH, dynamic allocation (potentially) ends up with
    *lots* of timers having to be managed (overhead).

    Consider, most timers will (presumably) NOT expire
    in normal use so the cost of maintaining them should
    be low -- foolish to allocate lots of resources to
    something that you hope not to "use".

    The idea of a "Timer per Service" doesn't work because a
    service can be multithreaded (in "reality" or in "effect").
    I.e., that would serialize all accesses to the service.

    A "Timer per Thread" is a potential solution -- *if*
    you also allow "general purpose" timers to coexist
    with that "service timer". (i.e., a thread could
    want to start a general purpose timer, *then* invoke
    a service with another "timeout" WITHOUT having to
    explicitly share that "general purpose" timer with
    the service provider).

    [N.B. this is an option but adds a lot of complexity
    to timer management -- and would limit the number of
    such general purpose timers that the thread could

    [Also, note that this could complicate the implementation
    of *layered* services]

    I could potentially trade away the ability to infer
    *elapsed* time from the examination of a (running)
    timer -- but, it is imperative that a timer be an
    independant entity in its own right (e.g., so you
    can pass the "time remaining" from one *completed*
    service call to the *next* service called).

    Or, is this just one of those areas where there is no
    "clever" solution?

    D Yuniskis, Dec 6, 2010
    1. Advertisements

  2. I don't quite understand your constraints. Is it the overhead of
    managing the very large set of timers? The usual priority queue is
    O(log(n)) in the number of timers in inserting, deleting and firing

    An alternative, if you have a fixed tick (say 1ms), is to use a series
    of groups of exponentially increasing buckets for timers. For
    example, have 128 buckets for timers at 1ms resolution, then 128 at
    64ms resolution, 128 at 4096ms... Structure each tier as a circular
    list of buckets. Insert a new timer in the linked list for the best
    bucket (IOW, if the new timer expires in the next 128ms, insert into
    the appropriate 1ms bucket, if it expires in more than that, but less
    than 8192ms, use the 64ms buckets). Every tick you just fire all the
    timers in the next 1ms bucket. Every 64ms redistribute the timers in
    the next 64ms bucket into the 1ms buckets, repeat every 4096ms for the
    next bucket in the third (4096ms) tier, etc.

    Basically that does one linked list insertion and deletion for timers
    firing in the next 128ms, 2 for those in the next 8192ms, 3 for 524s,
    etc. A handful of tiers gets you to timers as long as you practically
    need (for the above parameters, seven tiers gets you to 278 years).

    The simple design has one flaw, in that you get big bursts of
    redistribution work at the 2**6, 2**12, 2**18... ticks, but you can
    distribute that without too much added complexity by redistributing on
    each tick, but only ceil(1/64th) (etc.) of the contents of the bucked
    being redistributed.

    Obviously the length of the clock tick and the number of buckets in
    each tier can be varied (although reducing the size of the tiers
    increases the number of redistributions done over the life of a
    robertwessel2, Dec 7, 2010
    1. Advertisements

  3. D Yuniskis

    D Yuniskis Guest

    Hi Robert,

    This is similar to my current implementation -- however, I don't
    rigidly define the "intervals" in each "bucket".

    E.g., currently, a thread defines a Timer, then passes that
    timer to a service invocation and blocks on the result. If
    the service completes successfully, you can chose to examine
    the Timer (i.e., to determine how long you have blocked), can
    discard it or can pass the "time remaining" on to some subsequent
    service invocation (typically some OTHER service). If the
    service failed due to a TIMEOUT, you implicitly know that the
    Timer has "0" left on it. If the service failed for other
    reasons, then you address those problems.

    [Note that the timer presently exists in user space!]

    The RTOS keeps all "running" Timers on a queue sorted by
    expiration time. It separately manages a (hidden) list
    of pointers *into* that queue to expedite insertion, etc.
    The size of this "list of pointers" varies as a function
    of the number of active Timers. This lets the maximum
    "work" performed for an insertion be bounded by limiting
    the length of the longest (sub)queue that has to be parsed.

    E.g., if there are 45 active timers, I could opt to mark
    the list of 45 timers in 4 groups of ~12 timers, each -- note
    that the time interval represented in each such "subqueue"
    varies based on the times present *on* the timers in that
    "subqueue". If the number of timers suddenly increases to
    136, these 4 groups would mean the length of each "subqueue"
    has now increased to ~35 from that ~12 -- and the average
    time for parsing a queue would increase threefold. In this
    case, the OS can lengthen the "hidden list"... maybe opting
    for 10 groups of timers.

    Maintaining these lists incrementally is relatively
    inexpensive -- if a timer is deleted/inserted, at most one
    timer shifts into/out of each group (i.e., the whole group
    need not be reexamined/reshuffled).
    I opted for the "cut into N timers" instead of "N intervals"
    as most timers will tend to be similar time intervals. I.e.,
    there won't be any in the "278 year" tier. :> And, "steady
    state", threads will keep recreating the same timers over
    and over again as they keep accessing the same *services*
    over and over again, etc. [this is actually a falsehood but
    the details are unimportant, here]

    I store only deltas in each "timer" so scheduling the next
    "timer expiration" is easy -- just look at the delta in the
    timer at the head of the queue. OTOH, finding the expiration
    time (i.e., "time remaining") of the Nth timer in the queue
    requires examining all of the timers expiring *before* it.
    [This shows the bias towards short intervals and the fact that
    most timers will be used as timeOUTs.]
    In my scheme, you have to do a "redistribution" for each G timer
    "inserts + deletes" (where G is the number of groups in the
    hidden list).

    Having said all that... :>

    The bigger question: is there a way of bounding the number
    of timers so that I can determine implementation costs at
    design time instead of run-time? e.g., my suggestion of a
    "Service Timer" (per thread) seems like this does that
    it would require each service to execute on the invoking
    thread's stack -- or, force constraints on the services
    regarding what they can use, etc. (or, otherwise constrain
    the design of the service e.g., thread per client)

    [as I said before, this doesn't address the "general purpose
    D Yuniskis, Dec 7, 2010
  4. D Yuniskis

    Arlet Ottens Guest

    You could make all the threads responsible for managing their own memory
    resources. Assuming that you're working in C, a timer object could be a
    struct that includes the linked-list pointers. For example:

    struct timer
    struct timer *prev;
    struct timer *next;

    int period;
    // whatever else you need for the timer object

    In the thread, you then have the option of allocating timers statically,
    dynamically, or even on the stack:

    f( )
    struct timer x;

    init_timer( &x, ... );
    wait_timer( &x );
    del_timer( &x );

    The RTOS kernel would then just use the prev, next pointers of the timer
    struct, so you can avoid any additional memory allocation in the kernel.
    Arlet Ottens, Dec 7, 2010

  5. The scheme with the fixed buckets has a couple of advantages. First,
    there's no real computation done at the tick - you just grab the
    linked list of timers in the next 1ms bucket (and the list heads are a
    circular structure, so there's never any allocation of new list
    heads), fire them all, and you're done. A timer that's abandoned
    before being fired (as you'd expect for most timeouts) is just a
    deletion from its current linked list. Second, the cost of managing
    the timers is basically fixed based on the interval set, and is
    basically always a small integer number of liked list insertions and
    deletions (no more than seven for the aforementioned 278 years). The
    only time you compute the time is at the initial insertion of the
    timer or at one of its redistributions (FWIW, the redistributions are
    marginally simpler, since you don't have to decide which tier you're
    going to insert into - always the next finer one). The number of
    timers in the system does not impact the cost per timer at all.

    If many of yours timer are timeouts which do not need fine resolution
    and wide range, you could have a second set of buckets at (say) 1s
    resolution, and enough of them to cover almost all of your timeouts,
    and then you don't even do any redistributions, although they'll fire
    in bigger clumps. OTOH, if most timers do not actually fire, some
    judicious picking of tier sizes can ensure the same thing - in the
    above, for example, if most timers are between 8s and 262s, and are
    abandoned at least 8192ms before they expire, they're never
    redistributed (and one that do live long enough to fire, would be
    redistributed twice).

    You could embed a timer object in each object that might need one.
    They're not necessarily very big. A pair of linked list pointers, a
    pointer to the fire routine, a word of user data and a ~64 bit time.
    So ~24 bytes on a 32 bit system. Obviously you might have timers
    unrelated to a specific object, and embedding them somewhat assumes a
    single address space OS. Of course that just moves the problem
    elsewhere, but presumably you’re having to deal with the allocation of
    those other objects anyway.
    robertwessel2, Dec 7, 2010
  6. D Yuniskis

    Marc Jet Guest

    In my own embedded OS, which evolved over many years and several
    platforms, I found the following the most simple yet flexible

    Depending on the hardware, I identify the smallest time granularity.
    This could be CPU cycles for platforms with a cycle counter, or the
    timer register readable state (after prescaler). This I define my
    software timer granularity, and choose a type big enough to last
    "forever". 64 bits have been enough for my projects so far, and
    fortunately a 64 bit "unsigned long long" is supported by most
    compilers for register passing and function result. This type I
    define as "tim64".

    The HAL provides functions timGet() which assembles the current time
    as tim64 (sometimes just from hardware registers, sometimes through
    race resolution between hardware timers and ISR software state). Note
    that this is not wallclock time, but just an ever increasing value
    that starts at 0LL during power up. There are also functions to add
    and substract NS, US, MS and seconds, which implicitly contain the
    conversion between hardware cycles and human readable time scales.

    Based on this, I create a "polled" API. It is basically timGet() +
    timElapsed() for knowing how much time has passed since some moment.
    And timSet(ms) + timIsOver() for polled loops (timSet consists of
    timGet + timAdd).

    For asynchronous waiting, my world consists of signal producers and
    signal sensors. A timer can produce a signal (when time is over), and
    the application can sleep until its sensor receives such a signal. My
    timer signal is a small struct which combines the standard signal
    sensor and the tim64 target time. Note that this is an extra API,
    layered on top of the polled one.

    The API provides timSigCreate() to create a signal, in memory provided
    by the application (usually an autovar). With timSigAttach() it is
    queued into a sorted list of active timers, and timSigDetach()
    reclaims ownership of the memory by cancelling/removing from the list.

    The hardware ISR reprograms a hardware timer to trigger when the next
    queued timer signal wants to be awakened, and signals the
    corresponding tasks as their events happen. timSigAttach() and
    timSigDetach() resolve the potential races, and reprogram the hardware
    timer too. Often, the hardware timer has a different granularity than
    tim64, so conversion is necessary here, too.

    The signal sensor stuff is common over many OS services, and a task
    can wait on multiple signal sensors. This enables the timer signals to
    be used as timeouts for services that are themselves asynchonous.

    The philosophy of this is to push the storage burden to the
    application that requests a service. This helps scaling a lot, and
    there is almost no "unjustified" overhead.

    For the polled API just CPU registers or an autovar is enough. Time
    handling can often consist of just a few hardware register reads with
    no or little conversion. By not using the other API, a statically
    linked image can be tiny.

    On the other hand, there is absolutely no limit for asynchronous
    timers. Yet they can still be efficiently mapped into the .bss
    section. They can live as autovar in a "function with timeout", or as
    member of a handle that your service provides. There is no dynamic
    memory allocation involved (unless you do), and no "out of memory"

    I found that this approach is so much lighter on resources, and
    outweights the disadvantages (which is the memory ownership transfer
    in timSigAttach/Detach and therefore required programming discipline).

    Let me know (here) when you found this useful.

    Best regards
    Marc Jet, Dec 7, 2010
  7. D Yuniskis

    D Yuniskis Guest

    Hi Mark,

    I *used* to use a similar approach -- though much less fine
    grained (i.e., I *set* the granularity of my "time" to
    the finest that I would need -- almost always in bogounits).
    "Time" was always an integer (of varying size), etc.
    Similarly, "forever" was simply the longest time in which
    I was interested -- for "system" time (ignore wall clock
    for this discussion).

    In *this* application (media clients), I have no need for
    "system time". All time is timeouts and (elapsed) timers.
    I.e., nothing ever happens *at* a particular time but,
    rather, some time "from now" (so, the whole "set time",
    "wait for time", etc. API goes away -- along with the problems
    it brings).

    I have several "clocks" (cf. Mach) that can be used to drive
    timers. Most are virtual clocks derived from real-time "events".
    The precision of these varies as appropriate to their source.
    E.g., if a DHCP client wants to wait 30 seconds for a timeout,
    it *probably* doesn't care if that is 30.0000000000 seconds,
    29.9099325 seconds or "31" seconds. (i.e., that is how I apply
    the various clocks).

    OTOH, the audio streamer very much cares how precisely *this*
    next 44KHz sample is pushed out to the D/AC. At the same time,
    it probably has no interest in anything as far distant as
    "30 seconds hence".

    The application is carefully structured so that individual aspects
    of it need not be concerned with more than one sense of time.
    This eliminates/minimizes synchronization and conversion errors.

    Time is *not* an integral data type, however (despite being
    implemented as an "integer"). I used fixed point representations
    as they allow me to adjust the various FLL's and PLL's more
    sanely (simplifies the math). This allows me to keep the
    clock(s) in question syntonous with those on other clients
    while simultaneously giving me a lot of flexibility on phase
    control (it's just an offset in the math).

    As a result of all this, I can shrink "time"s to 32b datatypes
    (I am struggling to get them down to 16 -- probably an unnecessary
    optimization). This is effectively "forever" in terms of an
    of the application components' usage of them (e.g., 2G seconds?
    2G milliseconds? 2G video frames? etc.)
    Again, I have no need for "human readable" time units. Just like
    an engine controller cares little about "seconds" or "time of day"
    but, rather, how much BTDC it has to fire the injectors and
    detonate the plug. Bogounits prevail.
    Do you put the Scheduler into the same set of signal consumers?
    (that would be cool as it would let scheduling be treated as just
    another "thread")
    My (current) API provides a "create timer" hook. The user
    can optionally provide a pointer to a "timer struct" that
    it would like used for the timer's representation. However,
    the system can ignore this -- regardless, a handle for
    the timer is returned (if created). This lets me keep the
    timer "precious" if I want to (the API is common to a variety
    of different RTOS's I've designed over the years -- it's just
    a handy abstraction). It also lets me change the representation
    of the timer without the application being concerned (or bothered!)
    by that change.
    My timeouts live outside the scope of the function (service)
    invoked. E.g., I can do:

    Timer foo;
    Timer handle = Create_Timer(&foo, SOME_AMOUNT_OF_TIME, ...);

    if (TIMEOUT == Get_IP_Address(&foo, ...)) {
    /* didn't get IP address in time available; bad DHCP server? */

    if (TIMEOUT == Get_Program_Image(&foo, IPaddr, ...)) {
    /* didn't get program image in remaining time available; die */

    if (TIMEOUT == Initialize_System(&foo, ...)) {
    /* resources not available in time remaining; gasp! */

    log("startup completed with %d remaining", Get_Remaining(&foo));

    If I wrap this in a routine (e.g., "System_Startup"), then
    foo could be autovar (if the RTOS complies) -- assuming the
    rest of the system cares nothing about foo outside of this
    scope. Furthermore, if System_Startup had some tear-down
    times associated with it and was, itself, constrained by
    an imported timer, then it could deduct those times from
    the timer prior to invoking the constituent services so that
    a failure would leave it with sufficient time remaining
    to complete its tear-down actions and still return to its
    caller within the prescribed "time limit".

    [this is intended as an alternative to supporting deadline
    handlers in the RTOS itself]
    It still seems to avoid the (to me) obvious chances for
    optimization/exploitation -- e.g., these timeouts aren't
    "general purpose timers" that can be used to initiate
    events. They are *always* used in the same way -- to
    unblock a thread waiting on some service/resource.
    If all services are synchronous, then a thread can ONLY
    be blocking on one at a time. I.e., there is only need for
    one "service timer" per thread. And, that the use of that
    timer is preordained.

    I think I'll see what the cost of including it as part of
    the thread state and an implicit part of the scheduler gives
    me in terms of performance and cost.
    D Yuniskis, Dec 9, 2010
  8. D Yuniskis

    D Yuniskis Guest

    Hi Arlet,

    Yes, see the description of my "Create_Timer" API (in a reply to
    Mark Jet). I allow threads to *try* to manage the memory
    associated with the timer(s). (the RTOS reserves the right
    of managing them internally, if it chooses). But, doing so,
    *implicitly* makes all timers the same (i.e., it precludes
    any optimization based on the narrowly defined use expected
    of these "service timers"). It's "just another timer" that
    never uses the full capabilities of a "general purpose timer".

    I'm looking for "tricks" that are inherent in this particular
    type of usage that can be exploited for performance/space gains.
    E.g., I'm competing with using deadline handlers *in* the RTOS.
    Obviously, I am hoping to come up with a "cheaper" solution.
    (for some definition of "cheap").
    D Yuniskis, Dec 9, 2010
  9. D Yuniskis

    D Yuniskis Guest

    Hi Robert,

    [chomp (i.e., "big snip" :> )]
    (sigh) I don't think I'm following you. :-/

    Hopefully restating your environment:
    - the tick happens at 1ms intervals
    - all timers for a given "duration" (blech) are linked into a
    "duration-specific" list (i.e., any timer set for 3ms is
    linked into the "3ms list" -- this is just an efficiency
    hack to make it easy to grab *all* of the timers expiring
    at a given "instant"/tick, right?) You call each such
    list a "bucket" (??)
    - these lists are grouped into "tiers": the first tier
    spans the interval (0 - 128ms) and contains 128 lists
    (i.e., a list of 1ms timers, a list of 2ms timers, a list
    of 3ms timers, etc.); the second tier spans the interval
    (128 - 8192ms) and also contains 128 lists (i.e., the
    list of timers expiring in 128-191ms, the list of 192-235ms,
    etc.); the third tier spans (8192- ....
    [I hope my arithmetic is correct at this hour of the morning]
    - within a list, the individual timers are sorted by duration.
    For the lists in the finest grained tier, no sorting is necessary
    as each timer has the *same* duration (i.e., N ms). For higher
    tiers, the durations may vary due to a coarser grading (e.g.,
    the 128ms list contains timers expiring in 128, 129, 130... 191ms)
    [sorting could be donw at insertion or postponed to "redistribution"]

    Is this roughly correct? (I haven't checked the actual tiers; your
    values change between, e.g., 8192 and 4096 later)
    It seems like all the "tiers" buys you is convenient places to
    peek into what would otherwise be a single (sorted) list of
    timer expirations (?). If so, it loses its appeal when the
    range of timeout *values* is reduced -- yet the number of
    timers remains high.

    E.g., if I'm seeing *lots* of timers at a *few* different "values",
    then those might all end up in a single "tier". Or, a "few" tiers.
    So, the improvement the tiers give you is marginal compared to just
    having timers in "identical valued" lists. (?)

    I, for example, just have a single list with "markers" at
    particular points *into* the list (to cut search time).
    For expiration, I just pull timers off the head of the list
    having "0" time on them until I find one that has a nonzero
    (positive) "delta time". That time defines the next time that
    the "timers" will need to be examined (i.e., nothing expires
    now and that "delta time").

    If all equal valued timers were placed in explicit lists,
    then I wouldn't have to examine subsequent timers in my
    *single* list -- I could just pull *the* list of "expiring
    timers" off the head of the queue.

    But, I would then need to walk through *that* list so
    I still have to process each individual timer... :-/
    Timers are finer grained than that. I'm processing audio/video
    so ms and fractional ms are typical event times (though I am
    processing things in large bunches, etc.). "seconds" are
    "an eternity" :>
    Yes, that's my idea of creating a "service timer" for each thread.
    With synchronous services, you would only ever need *one* such
    service timer. And, it could "belong" to that thread. Since
    you *know* how it will be used, you could exploit that knowledge
    in your design of the scheduler, etc. instead of treating it as
    "yet another timer".
    Size, as always, is relative. :> I am trying to prune this
    application down to "nothing" to make it's application ubiquitous.
    E.g., we've gone from "speakers" to "powered speakers"... I
    want to go to "smart speaker" (plug ethernet cable in back
    and you're listening to "whatever"). Ditto for video.
    D Yuniskis, Dec 9, 2010

  10. No. Each bucket contains timer expiring at a certain time, nothing to
    do with their durations.

    So the first tier is a set of 128 buckets, the first of which holds
    timers expiring 1ms from now (IOW, at the next tick), the second 2ms
    from now (second tick), etc. You'd obviously implement that as a
    circular structure, and inserting a timer with an expiration 5ms in
    the future would add that timer to the linked list in the fifth bucket
    from the current start. Then at each tick you just grab the next
    bucket, dispatch all those timers, and step to the next position in
    the circular structure.

    The obvious problem with that is that in general it's impractical to
    have enough buckets to cover all the possible timer expirations (you’d
    need 3.6 million buckets to cover an hour). So instead you have
    addition tiers of buckets at exponentially lower resolutions. So if
    your adding a timer firing in 500ms, which doesn't fit in the 128 1ms
    buckets of the first tier, you add it to the second tier, in the
    seventh bucket. The second tier is composed of 64ms buckets, the
    first of which contains timers that will expire at least 64ms from
    now. Periodically (every 64ms), you take the timers in the first
    bucket in the second tier (which will be expiring in 64-127ms), and
    redistribute them in to the first (1ms) tier, from where they
    eventually get dispatched. The second tier is also a circular
    structure, and gets stepped around after the redistribution.

    Repeat with additional coarser grained tiers as needed.

    So to set a timer expiring 10,000ms from now, you'd add it to the
    second bucket of the third tier. Somewhere between 4096 and 8191
    ticks before that timer is due to fire, the first bucket in the third
    tier (now containing that timer) will be redistributed to the second
    tier. Let's say the redistribution happens 5000ms before expiration,
    then the timer would end up in the 78th bucket in the second tier.
    Later, somewhere between 64 and 127ms before firing, the first bucket
    in the second tier (which now contains the timer in question) gets
    redistributed to the first tier. Assuming my arithmetic is correct,
    and the redistributions are synced on the 64th/4096th/etc. ticks, the
    timer would be moved to the 72nd bucket in the first tier. 72 ticks
    later, the timer would be dispatched.

    As each "bucket" is just the head of a doubly linked list, adding or
    removing a timer from a bucket is a trivial linked list operation. In
    the case of the initial insertion, you need to determine in which tier
    to initially place the timer (timer expires in less than 128ms,
    8192ms, 524,288ms…), and then it's a subtract and divide (by a power
    of 64, in this example) to select the bucket into which the timer is
    inserted. The redistribution between tiers is even simpler, you just
    run through the linked list of timers that need to be redistributed
    (from the first bucket in the tier), and you just insert each of them
    in the appropriate bucket in the next finer tier.

    So the total processing of a timer consists of the initial insertion,
    zero or more redistributions (based on the log64() of how far in the
    future the timer expires), and the final dispatch. All of which are
    simple linked list insertions or just a step through a linked list.
    The total storage required is several tiers of 128 buckets, the
    buckets basically being just the list heads. And as I mentioned,
    seven tiers gets you a couple of centuries of timer. Plus, of course,
    the timers themselves.
    robertwessel2, Dec 9, 2010
  11. D Yuniskis

    Jon Kirwan Guest

    You know your application and I haven't been reading much of
    the thread, waiting to see how it develops. I'm still kind
    of ignorant because I've skimmed. But have you read Comer's
    XINU book and taken into account his chapter on delta queues?
    It almost sounds like you are struggling in that direction to
    me, but only you'd know if that's so. Mostly, I'm curious if
    you'd taken the time to read his book and are familiar with
    the idea enough to discuss it in your context.

    Jon Kirwan, Dec 10, 2010
  12. D Yuniskis

    Marc Jet Guest

    I notice there is some overlap between what we came up with :)
    I don't understand the question. Maybe the comments below resolve it,
    otherwise please elaborate. What can be said is that both the ISR and
    Attach/Detach work at the same abstraction level (they reprogram the
    timer hardware). On the task switching level, the ISR executes
    asynchronously and interacts with the scheduler, while Attach/Detach
    always execute in the context of the calling task. Due to a potential
    race, Detach sometimes also interacts with the scheduler.
    Not necessarily. In the async API described in my previous post, the
    "timer signal" describes a moment in time and triggers when that
    moment is reached. The trigger mechanism is implemented using my
    abstract concept of signal producers (which in this case represents
    the timer ISR side), and the trigger event manifests as signal on the
    sensor (which always represents a task).

    A task can block on one or multiple sensors. The obvious use case for
    blocking on a single sensor is the sleep() function. A use case for
    multiple sensors, is to wait for IO to finish plus a timeout.

    At first sight it doesn't make sense to simultanously block on
    multiple timer signal sensors, because there is always one to time out
    first. But even that is a possible use case! Specifically when a
    function implements own timeouts and ALSO honors timeouts from the
    upper layers (e.g. given as function argument). If it wasn't allowed
    to block on multiple timeouts simultanously, it would have to deep-
    inspect them all and find the first one to expire.

    Note that my signal sensors and producers are generic entities, while
    the timer signal is a specific implementation using them. Although
    it's the most prominent one that I made and use, the producer / sensor
    concept is generic IPC and removes the task scheduling aspects from
    drivers and ISRs.

    BTW your example code snippet, ported to my polled API, would look
    almost identical. Except there is no handle + struct/object, but
    rather the handle IS the struct/object.

    This is a very important thing to understand. There can be as many
    independent time references as the application desires. They can be
    duplicated, passed to other modules, and dropped without caring about
    storage ownership.

    From the multitasking point of view, with the polled API there's no
    effect beyond the current task. With the async API on the other hand,
    there's a common (sorted) list for all timer signals that are
    "attached" to the global producer (which basically is the timer ISR).
    "Attaching" DOES affect beyond the scope of the current application
    (cpu-wise, not memory wise). More timer signals on the list
    translates to slower attaching for everybody, and also more ISR
    Merging my polled API into the thread would not be beneficial, because
    it would prohibit using several timeouts at once. They are useful for
    nested timeouts, e.g. when a subfunction decides to implement a
    timeout without exposing it on the API. Also, in more complex
    statemachines I often find myself using several time references
    simultanously (e.g. last character received, last some_event, last lcd

    If you referred to merging the async timer signals into the thread, it
    may have less negative impact - but still has. It would reduce them
    to really just be "an absolute timeout" for "the current activity".
    It would mean that there (suddenly) must be a strict boundary between
    OS functionality (which unconditionally aborts all action upon
    timeout) and application functionality (which initiates retry
    mechanisms and sane error handling, instead of just passing the
    timeout error upstream). Also, to a lesser degree the previous
    paragraph also applies here.

    Best regards
    Marc Jet, Dec 10, 2010
  13. D Yuniskis

    D Yuniskis Guest

    Hi Marc,

    <grin> If you do this sort of thing long enough, you end up
    touching on damn near every possible implementation strategy
    (unless, of course, you take a pedestrian approach to it and
    "settle" for "the first thing that works" :> )
    I *think* that's close to what I am suggesting (asking) above
    (barring terminology).

    I've taken several approaches to the *basic* structure of
    timer/scheduler over the years (here, I am ignoring implementation
    of the actual timers themselves but, rather, how the "timing
    service" relates to the "scheduling").

    By far, the most common approach has been to create a periodic
    interrupt ("tick") that the hardware uses to signal a *timing*
    service. This directly or indirectly updates the "timers"
    (how is not important). A "special" timer is the "scheduling
    timer" that defines the time left on the executing thread's
    timeslice. When that timer expires, the scheduler is invoked to
    "preempt" the current thread [there are lots of caveats, here,
    that can be ignored... important thing is "scheduler sits atop
    timing service"]. Again, the mechanism by which the scheduler
    is invoked is not important (and can be varied).

    [Note that the "tick" can thus be semantically different from
    the "jiffy". Further, that threads can have different size
    timeslices -- e.g., this allows finer partitioning of the
    CPU *within* a priority level]

    Also common with this scheme, time doesn't exist outside of
    the scope of the "tick". A practical consequence of this is
    that a thread that yields/blocks just prior to its timeslice
    ending (to be pedantic: "just prior to the next tick") can
    *cheat* the next thread to execute out of a portion of its
    timeslice. This is because the scheduler runs (because of the
    yield) AS IF it had run at the preceding tick. So, the next
    thread is activated... and a tick comes along (coincidentally)
    immediately thereafter! :>

    You can work around this by crediting the thread with a partial
    tick (in effect, increasing its timeslice by [0,1) tick) -- but
    that just adds a bias in the other direction (i.e., instead of getting
    cheated out of a tick, it *gets* some portion of a free tick).

    To "fix" this problem, you can have the scheduler restart the
    "tick interval" as it activates the next thread. In effect,
    resynchronizing the tick with the jiffy at each thread activation.
    In that case, a thread that does NOT block or yield is guaranteed
    its entire timeslice.

    But, now the tick is running at a varying frequency. (and, the
    scheduler is having to muck with hardware!) This complicates
    use of the tick (which drives the jiffy) for reliable timing.

    You can "fix" this problem by running the timing services independant
    of the scheduler (e.g., another "clock"). Or, you can "do the math"
    (keep track of residuals) to track the "jitter" in this "clock"
    so that scheduling/timing are unaffected (i.e., compensated).

    Or, just live with the problem. (i.e., pretend it doesn't happen often
    and/or that it happens "equally" to all threads)

    What I was suggesting was letting the timing service run the
    IRQ exclusively. It peeks at an ordered queue of "timers"
    and programs the hardware timer (clock) with the period of
    the next timer (at head of queue). I.e., this is the soonest
    "thing" (temporal event) that is going to happen so any "clock"
    IRQ's prior to that would be "uneventful".

    Now, have the scheduler feed events (which should be at most one?)
    into that same queue. So, coincident with activating the next
    thread to execute, the scheduler injects a timer event into that
    queue with an expiration time of "now + timeslice" -- and *remembers*
    a handle for that event!

    If the scheduler gets invoked prior to that time, it has the
    responsibility of deleting that (no longer pertinent) event
    and reinserting a *new* one for the thread that it is activating
    *now* (which may have a different timeslice!).

    The problem with this approach is there is nothing that limits
    the granularity of events in that queue. E.g., if you allow time
    to be specified in "timer clock cycles", then you could
    conceivably have the two events in the queue differing by only
    a few clock cycles -- too close temporally to be handled as
    distinct events. (so, you want to be restrict the values
    that can be used for "times" -- which brings you back to the
    first implementation -- effectively)
    Correct. I allow any number of "general purpose" timers.
    Note that I also allow timers to be inspected. I.e., a
    consumer can "time" how long something takes (and that
    thing can, in turn, time how long parts of *it* take, etc.).

    This is essential if you want your code to adapt to the
    temporal constraints of its environment (instead of just
    breaking -- missed deadline -- as load increases). It
    makes more sense to implement "timers" (as in "measure
    elapsed time") this way and have the consumer track the
    "initial value" than to have the timing service do it.
    Yes -- but then you can't drag the timer out of the user's
    address space. I.e., the user can always directly access
    the contents of the timer struct in your implementation.
    My API allows the OS to "hide"/protect the timer *from*
    the "user" (a bug in your code overwriting some key part
    of the struct could crash the timer service, for example).
    Yes, though see above.
    Yes, the current task (thread) essentially acts as its own
    timing service. OTOH, there are no guarantees that it will
    promptly "see" an expired timer (e.g., the entire thread may
    be blocked when the timer expires; or, the thread may not
    be executing that code when the timer actually expires).

    The first MTOS that I designed used a similar timing service
    except the timers were updated by a "timer thread" (i.e.,
    timers were no different from any other "task"/service).
    The timer thread, in turn, polled a global variable updated
    by the tick ISR (i.e., nothing happened in the tick ISR
    other than incrementing a counter). Timers were arranged
    in an array (that the timer thread managed). A consumer
    could directly reference the value *in* a timer (e.g.,
    by accessing "timer").

    This made for an incredibly powerful programming model.
    Since checking the timer was very efficient, you wrote
    code like:

    if (DATA_AVAILABLE != check_uart()) {
    /* nothing here, yet */
    if (timer[UART_TIMEOUT] > 0) {
    /* timer hasn't yet expired */

    /* keep waiting */
    } else {
    /* timeout waiting for data */
    } else {
    /* received data; process it */

    I.e., it was *faster* to just peek at the timer's
    current value and "yield" than it was to implement
    an API to "wait_for_timer()". It also let you *do*
    things while waiting which, otherwise, required a
    separate thread *or* a heavier implementation.

    (I still use this approach on *tiny* designs)
    Exactly. The "current activity" is a service invocation.
    Since service invocations are synchronous (my choice), there
    can be only one (for each thread). Note that this does not
    preclude having general purpose timers -- as many as you
    want -- in addition to this "service timer". I'm just
    exploiting the fact that the "service timer" has a fixed
    and inflexible role. It need not have the same capabilities
    of a general purpose timer (e.g., it *belongs* to a particular
    thread; something that isn't necessarily a requirement to
    be imposed on general purpose timers).

    I think this will let me make it lightweight enough that
    it ends up a net savings over using a general purpose timer
    for the same purpose (both in terms of space *and* time).
    No. OS doesn't have to "abort" anything. I.e., if I were
    to implement deadline handlers to do this same sort of thing,
    then they *would* (typically) abort pending/blocked operations
    9service calls). [this makes coding the application simpler
    but adds a lot of machinery in the OS -- as well as forcing the
    application to handle the asynchronous nature of aborted

    If, OTOH, I adopt the *practice* that every service supports
    the concept of a timeout and is *coded* with this in mind,
    then they can assume (at some cost) the responsibility of
    watching -- and compensating -- for their own missed deadlines.
    D Yuniskis, Dec 10, 2010
  14. D Yuniskis

    D Yuniskis Guest

    hi Robert,

    [chomp (i.e., "big snip" :> )]
    Sorry, that's what I meant: "X seconds from now" (now being the
    entry just before the head of the list).
    So, some "buckets" can be "empty".
    Are you discarding "resolution" (bad choice of word) as the
    interval increases? E.g., do timers specified for 8190 and 8191
    ms (from now) *both* expire at the same ACTUAL instant?
    <frown> Too much chance of my math and yours not agreeing so
    I won't follow the numbers. Rather, the answer to my above
    question should clarify weather *your* math is tracking "all
    10,000 of those milliseconds" *or* is just rounding it
    off to the nearest X.

    I.e., is the time stored in the struct as well as impliclty
    in the list in which it resides (regardless of which tier)?
    If so, then each "redistribution" requires you to examine
    every timer in the first bucket of "the second tier" and
    disperse them to potentially different buckets in the *first*
    tier -- based on their actual values (in effect, "time remaining").

    OTOH, if you are using coarser "resolutions" for each successive
    tier, then 10000 ms is the same as 10001 ms (i.e., you are throwing
    away some portion of your specified time)
    I can't comment on that without clarification of the above. :-/
    D Yuniskis, Dec 10, 2010
  15. D Yuniskis

    D Yuniskis Guest

    Hi Jon,

    Comer, Tanenbaum, Organick, Savitzky, Foster, McKusick, Bach,
    Boykin, etc. I read Comer more than 20 years ago. If you
    read this thread carefully, you'll see I've already alluded
    to delta queues.

    The problem isn't "how to implement timers". Rather, the
    problem is "how to implement timers that are used in a
    specific way" (the implication being that some economies
    can be gained by rigidly constraining *how* those timers
    would be used).

    All of the above "suffer" from the fact that they deal
    with *generic* timing instead of its application to
    (this) specific goal. And, they all tend to be "heavy"

    E.g., if you were implementing a set of "reminders" for
    an appointment book/calendar (I always gravitate to PDAs
    for examples :> ), you probably *wouldn't* use nanosleep()!
    Aside from having far too much precision, it would
    complicate the implementation: how much time between
    "now" and "whenever I need to signal the user that it
    is time for his 'appointment'"? The code would be
    overly complex and its purpose harder to fathom.

    Instead, you would have some other "timing" layer that
    presented something more user-oriented to your code.
    E.g., the granularity of "appointments" would be
    deliberately synthesized to lump "alarms" into
    single, discrete times (so you don't end up with
    alarms going off three milliseconds apart, etc.)
    Forget *how* to implement timers. Instead, think about
    "how to implement timers that will ONLY be used in this
    fashion" :> (timing "services")
    D Yuniskis, Dec 10, 2010
  16. D Yuniskis

    Jon Kirwan Guest

    Thanks. I hadn't read it carefully, as I'd indicated. And
    this is a good answer to me. Just wanted to ask.

    Jon Kirwan, Dec 10, 2010

  17. Sure. And an empty bucket is trivial to deal with at the tick.

    No. They'll probably move from the second to first tier in the same
    batch, but end up in adjacent tier 1 buckets, and then fire at
    successive ticks.

    That's correct. The timer stores its actual expiration time.
    Neighboring timer's "declump" as the move towards the finer resolution

    No, the other way...
    robertwessel2, Dec 10, 2010
  18. D Yuniskis

    D Yuniskis Guest

    Hi Robert,
    OK, the point I was trying to identify was that you *do*
    track the "actual" interval specified (and don't just
    "round it off" to coarser granularity as it increases)
    So, in all but the first tier (assuming the first tier *always*
    has the granularity of the tick), the "redistribution" requires
    examining each timer as you "promote" (demote?) it to the finer
    granularity of the tier above (below? I guess it depends on which
    way you look at things :> ) to REDISTRIBUTE the individual timers
    into the buckets on that new tier.

    However, you should only have to do this for (at most) the first
    "bucket" of each tier when that time comes (?)

    It seems like this approach is geared towards lots of timers
    of wide dynamic range. The overhead seems geared towards
    letting you ignore most/many of them until close to their
    expiration time (though you have an ongoing, sporadic effort
    to *maintain* them as they move to finer grained "buckets"
    (i.e., you have to revisit them several times before they
    D Yuniskis, Dec 12, 2010

  19. Pretty much. Although I would consider the timers to be basically
    wholly ignored between initial insertion and dispatch, except for any
    required redistributions.

    It has the advantage of scalability, as the amount of work done per
    timer does not increase as the number of timers in the system
    increases (which is not the case for most schemes), and not placing
    any limits on the range of the timers (other than being tick-grained).

    The number of redistributions (each of which is very low cost), is
    always small (no more than six in the configuration we're discussing),
    and occurs over the lifetime of the timer. The total work expended
    for a timer over its lifetime is of a fairly small range - on the
    order of 3:1 for a very long duration timer initially deposited into
    the highest tier vs. a short one initially placed in the lowest. The
    most expensive operation being the initial tier selection (for which a
    good Count Leading Zeros instruction is very helpful). Certainly the
    simple redistribution algorithm is bursty, but smoothing that out is
    not hard (you have 64 ticks to complete the redistribution of a bucket
    from the second tier to the first, and you can just do 1/64th of the
    work each tick). The initial insertion and final dispatch are also
    lower cost than most schemes (for example, the traditional priority
    queue), for even a modest number of timers.

    Obviously not counting the actual work of processing each timer when
    it fires.

    As applied to your original post, you could attach timers to things
    pretty much at whim, and other than storage costs, have them add very
    little overhead to the system, without the complexity of having to
    share timers.
    robertwessel2, Dec 13, 2010
  20. D Yuniskis

    D Yuniskis Guest

    Hi Robert,

    Yes -- though you would also have to deal with new timer activity
    while doing that processing.
    I think this is worth "filing away" for use in a "richer"
    environment. But, I think (for this project), I can hack
    a "service timer" into each thread and fold the support for
    it into the scheduler as if it was a core system feature
    (which it will be) -- just like any other aspect of the
    system. I think the distinguishing characteristic will be
    the range of "intervals" and their temporal distribution
    (e.g., something that effectively sits in a loop "steady state"
    *probably* won't benefit from the extra structure ?)
    D Yuniskis, Dec 15, 2010
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.