1. This forum section is a read-only archive which contains old newsgroup posts. If you wish to post a query, please do so in one of our main forum sections (here). This way you will get a faster, better response from the members on Motherboard Point.

Re: Can someone explain step by step how one avoid many conditional in forth as described in Moore F

Discussion in 'Intel' started by Rod Pemberton, Jan 11, 2012.

  1. ....
    Is that still true for multiple cores?

    I.e., I would think the following is entirely possible and plausible, but I
    haven't studied a CPU design in decades. The CPU's designers could execute
    the process in parallel with one core taking one direction for the branch
    and the another core taking the other branch direction. Once the correct
    branch is determined, the bad execution path is discarded. If the primary
    core had the good execution path, it just continues execution. If the
    alternate core had the good execution path, it's internal state could be
    "pushed" to the primary core. If they used static ram for the internal
    state, then it could be "pushed" asynchronously, i.e., between clocks or
    sub-clocks. It would require reserving a core for the branch path
    execution, at least temporarily.

    Rod Pemberton
    Rod Pemberton, Jan 11, 2012
    1. Advertisements

  2. Search on "speculative execution". It can also be done at the compiler
    level; that doesn't require processor support. Intel's P6 was their
    first chip to support it iirc.
    Alex McDonald, Jan 11, 2012
    1. Advertisements

  3. Rod Pemberton


    But don't believe any page that says that it is a solution to the
    problem of memory latency (on cache misses). It isn't, and won't
    ever be, for any 'normal' language and application. That's a MUCH
    harder task :-(

    Nick Maclaren.
    , Jan 11, 2012
  4. That's "Eager Execution" (EE), something which good branch prediction
    makes even less of a win: You can never go through more than 2-3
    branches like this before the power loss becomes totally horrendous
    (running 4-8 threads in parallel, 75-88% wasted resources), while lots
    of current code can branch predict past 6-8 branches before the
    probability of a branch miss passes 50%, and all those branches could
    actually be inside the pipeline at once.

    EE should only be considered as a last resource for branches where the
    runtime branch history shows very low predictability, i.e. closer to 50%
    branch misses.

    BTW, back in March 1995 I visited Intel in Santa Clara, in order to meet
    some people from the P6 team and to get an engineering sample PentiumPro

    I had to sign an NDA up front, so before doing that I insisted that I'd
    be allowed to write down on a whiteboard everything I then knew or had
    been able to guess about the architecture, based on nothing but public
    sources. I.e. this was to properly delineate what the NDA could cover.

    It turned out that I was correct in every particular, except for one
    thing: I expected EE because I didn't realize how much they had been
    able to improve the branch predictor.

    The engineers on the other side of the table were very happy, while some
    of the management and legal people seemed quite worried. :)

    Terje Mathisen, Jan 11, 2012
  5. Rod Pemberton

    Arnold Doray Guest

    I don't think this is how it's done on modern CPUs. They use branch
    prediction, which can be as good as 70% - 80% right,if I recall
    correctly. When the wrong branch is taken, the entire pipeline needs to
    be flushed. That's the reason to avoid conditionals.

    But let's say for argument's sake that you're right. There are a few
    problems with your scheme:

    1) It's slow: because you're tying up another core. So you effectively
    have one core less than previously.

    2) Also, the switching overhead may be large if it has to be done often.

    3) What happens with code containing multiple branches? You can't follow
    them all. What to do?

    4) Let's suppose you have lots of cores so that (3) is not a big issue.
    Still, the number of instructions before the next branching could be less
    than your pipeline's depth. So you're wasting the advantage offered by a
    deep pipeline.

    Thinking about Gavino's questio again, it occurs to me that with
    predictive branching, a long loop will almost never flush the pipeline
    (because the prediction is done by keeping statistics of the historical
    branch choices). So this reason alone (pipeline flushing) is not a good
    reason avoid using loops. Perhaps CM's comments were made when the
    prediction was not as sophisticated as it is now.

    Of course, avoiding conditionals entirely will give a faster solution.
    How much faster is the question. Anyone has comments/experience with this?

    Arnold Doray, Jan 11, 2012
  6. Taking both paths on a branch is currently called "eager execution".

    Branch prediction is one form of "speculative execution".

    P6 did speculative, but not eager. As, for that matter, did P5 (Pentium).

    I don't know of anyone doing full eager execution, although it has been
    studied out the bejeezus. Branch prediction usually beats it. I
    believe an IBM chip did eager ifetch - fetching both sides of a branch -
    but did not actually execute, stalled at decoder or therabouts.
    Andy (Super) Glew, Jan 11, 2012
  7. Rod Pemberton

    Joe keane Guest

    Doing some 'early' loads seems like a good idea, assuming that you have
    some IF units free.

    But that is a *lot* different from the other suggestion, copying the
    machine state to a different core.
    Joe keane, Jan 13, 2012
  8. Microarchitectures proposed for eager execution:

    * using the same pipeline(s) as ordinary non-eager
    ** e.g. if stalled on the main path, execute on the alternative path
    ** DEE: e.g. if have gone too far down one path, execute on an
    alternative path that has higher relative probability of success than
    predicting yet another branch on the current path.

    Actually, the DEE technique applies to all forms of eager execution.

    * using eager threads on the same CPU core
    ** although not necessarily the same pipeline(s) - eager threading was
    one of the techniques I wanted to investigate on top of my multicluster
    multithreading (MCMT) substrate -- although in my experience eager
    nearly always loses out to non-eager forms of speculation.

    Note that this is not necessarily the same pipeline resources, since in
    my terminology an MCMT machine like Bulldozer is really a single CPU
    with multiple execution clusters.

    * using eager threads of execution on different CPU cores.

    Problem is that eager threads seldom last long, so it is hard for eager
    threads to overcome the cost of migrating to another core.
    Andy (Super) Glew, Jan 13, 2012
  9. Rod Pemberton

    Joe keane Guest

    Essentially creating a new thread, for every -branch-, falls into the
    'that don't sound right'.
    Joe keane, Jan 14, 2012
  10. I've thought about this in the context "execute alternating instructions
    from both sides of the branch, adding implicit predicates to them", so
    that when the branch retires the predicates become known and half the
    instructions are cancelled.

    The problem is of course that you need to fetch from two paths at the
    same time, i.e. you need a pair of virtualized IP registers as well.

    The first criterium must be that you only ever do this for code that has
    been executed a number of times, and where the branch history has turned
    out to be very unpredictable.

    Another idea could be to start with the predicted branch, then switch to
    the other on the first stall/cache miss or new branch.

    Terje Mathisen, Jan 14, 2012
  11. This reminds me of a (not so useful) idea I posted here
    21 Oct 2010, "Move on 'commit' predication?" Message-ID:

    It was basically just a (wrong-headed?) microarchitecture
    for supporting dynamic predication (of hammock branches)
    where one path was less likely but uncertain enough to
    justify predication.

    (One of the issues of eager execution is recognizing joining.)
    Paul A. Clayton, Jan 14, 2012
  12. Nobody would seriously propose a new thread for every branch.

    You would only consider eager threading for flakey branches, where you
    don't know which way they are going to go. And won't know for a while.

    Myself, I think non-threading approaches to eager execution are better.
    E.g. the discussion of short forward branches - emit colde that
    computes both sides of such a hammock, with operations like Phi nodes to
    bring back rto a single dataflow.
    Andy (Super) Glew, Jan 15, 2012
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.