1. This forum section is a read-only archive which contains old newsgroup posts. If you wish to post a query, please do so in one of our main forum sections (here). This way you will get a faster, better response from the members on Motherboard Point.

Intel details future 'Larrabee' graphics chip

Discussion in 'Intel' started by NV55, Aug 4, 2008.

  1. NV55

    NV55 Guest


    Intel has disclosed details on a chip that will compete directly with
    Nvidia and ATI and may take it into unchartered technological and
    market-segment waters.

    Larrabee will be a stand-alone chip, meaning it will be very different
    than the low-end--but widely used--integrated graphics that Intel now
    offers as part of the silicon that accompanies its processors. And
    Larrabee will be based on the universal Intel x86 architecture.

    The first Larrabee product will be "targeted at the personal computer
    market," according to Intel. This means the PC gaming market--putting
    Nvidia and AMD-ATI directly into Intel's sights. Nvidia and AMD-ATI
    currently dominate the market for "discrete" or stand-alone graphics
    processing units.


    Larry Seiler (standing, middle), a senior Intel engineer, and Stephen
    Junkins (sitting, right), an Intel graphics software architect, speak
    at a briefing on Larrabee chip, due in 2009-2010.
    (Credit: Brooke Crothers)

    As Intel sees it, Larrabee combines the best attributes of a central
    processing unit (CPU) with a graphics processor. "The thing we need is
    an architecture that combines the full programmability of the CPU with
    the kinds of parallelism and other special capabilities of graphics
    processors. And that architecture is Larrabee," Larry Seiler, a senior
    principal engineer in Intel's Visual Computing Group, said at a
    briefing on Larrabee in San Francisco last week.

    "It is not a GPU as many have mistakenly described it, but it can do
    most graphics functions," Jon Peddie of Jon Peddie Research, said in
    an article he posted Friday about Larrabee.

    "It looks like a GPU and acts like a GPU but actually what it's doing
    is introducing a large number of x86 cores into your PC," said Intel
    spokesperson Nick Knupffer, alluding to the myriad ways Larrabee could
    be used beyond just graphics processing. In addition to the PC, high-
    performance computing and workstations are two potential markets that
    were also mentioned.

    Intel describes it in a statement as "the industry's first many-core
    x86 Intel architecture." The chipmaker currently offers quad-core
    processors and will offer eight-core processors based on its Nehalem
    architecture, but Larrabee is expected to have dozens of cores and,
    later, possibly hundreds.

    The number of cores in each Larrabee chip may vary, according to
    market segment. Intel showed a slide with core counts ranging from 8
    to 48, claiming performance scales almost linearly as more cores are
    added: that is, 16 cores will offer twice the performance of eight

    The individual cores in Larrabee are derived from the Intel Pentium
    processor and "then we added 64-bit instructions and multi-threading,"
    Seiler said. Each core has 256 kilobytes of level-2 cache allowing the
    size of the cache to scale with the total number of cores, according
    to Seiler. And application programming interfaces (APIs) such as
    Microsoft's DirectX and Apple's Open CL can be tapped. "Larrabee does
    not require a special API. Larrabee will excel on standard graphics
    APIs," he said. "So existing games will be able to run on Larrabee

    So, what is Larrabee's market potential? Today, the graphics chip
    market is approaching 400 million units a year and has consolidated
    into a handful of suppliers. "And of that population, two suppliers,
    ATI and Nvidia, own 98 percent of the discrete GPU business."
    according to Peddie.

    "And the trend line indicates a flattening to decline in the
    business...However, Intel is no light-weight start up, and to enter
    the market today a company has to have a major infrastructure, deep IP
    (intellectual property), and marketing prowess--Intel has all that and
    more," Peddie said.


    Larrabee combines aspects of a CPU and GPU
    (Credit: Intel)

    Though more details will be provided at Siggraph 2008, some key
    Larrabee features:

    Larrabee programming model: supports a variety of highly parallel
    applications, including those that use irregular data structures. This
    enables development of graphics APIs, rapid innovation of new graphics
    algorithms, and true general purpose computation on the graphics
    processor with established PC software development tools.

    Software-based scheduling: Larrabee features task scheduling which is
    performed entirely with software, rather than in fixed function logic.
    Therefore rendering pipelines and other complex software systems can
    adjust their resource scheduling based each workload's unique
    computing demand.

    Execution threads: Larrabee architecture supports four execution
    threads per core with separate register sets per thread. This allows
    the use of a simple efficient in-order pipeline, but retains many of
    the latency-hiding benefits of more complex out-of-order pipelines
    when running highly parallel applications.

    Ring network: Larrabee uses a 1024 bits-wide, bi-directional ring
    network (i.e., 512 bits in each direction) to allow agents to
    communicate with each other in low latency manner resulting in super
    fast communication between cores.

    "A key characteristic of this vector processor is a property we call
    being vector complete...You can run 16 pixels in parallel, 16 vertices
    in parallel, or 16 more general program indications in parallel,"
    Seiler said.
    NV55, Aug 4, 2008
    1. Advertisements

  2. NV55

    NV55 Guest

    1. Advertisements

  3. NV55

    Miles Bader Guest

    An interesting project, but man, now the hype machine starts up... :-(

    Miles Bader, Aug 5, 2008
  4. NV55

    NV55 Guest

    A very pessimistic Larrabee article by Peter Glaskowsky....I recall
    him also being negative about CELL during its development.


    Intel's Larrabee--more and less than meets the eye
    Posted by Peter Glaskowsky

    Intel announced on Monday that it will be presenting a paper at
    Siggraph 2008 about its "many-core" Larrabee architecture, which will
    be the basis of future Intel graphics processors.

    The paper itself, however, has already been published, and I was able
    to get a copy of it. (Unfortunately, as you'll see at that link, the
    paper is normally available only to members of the Association for
    Computing Machinery.)


    Intel's Larrabee includes "many" cores, on-chip memory controllers, a
    wide ring bus for on-chip communications, and a small amount of
    graphics-specific logic.
    (Credit: Intel)

    The paper is a pretty thorough summary of Intel's motives for
    developing Larrabee and the major features of the new architecture.
    Basically, Larrabee is about using many simple x86 cores--more than
    you'd see in the central processor (CPU) of the system--to implement a
    graphics processor (GPU). This concept has received a lot of attention
    since Intel first started talking about it last year.

    The paper also answers perhaps the biggest unanswered question about
    Larrabee--what are the cores, and how can Intel put "many" of them on
    a chip when desktop CPUs are still moving from two to four cores?

    Intel describes the Larrabee cores as "derived from the Pentium
    processor," but I think perhaps this is an oversimplification. The
    design shown in the paper is only vaguely Pentium-like, with one
    execution unit for scalar (single-operation) instructions and one for
    vector (multiple-operation) instructions.


    The Larrabee core contains only two execution units: one for scalar
    operations, one for vector operations.
    (Credit: Intel)

    That's the basic answer: Larrabee cores just have less going on. A
    quad-core desktop processor might have six or more execution units,
    and a lot of special logic to let it reorder instructions and execute
    code past conditional branches just in case it can guess the direction
    of the branch correctly. This complexity is necessary to maximize
    performance in a lot of desktop software, but it's not needed for
    linear, predictable code--which is what we usually find in 3D-
    rendering software.

    But the vector unit in Larrabee is much more powerful than anything in
    older Intel processors--or even in the current Core 2 chips--because
    3D rendering needs to do a lot of vector processing. The vector unit
    can perform 16 single-precision floating-point operations in parallel
    from a single instruction, which works out to 512 bits wide--great for
    graphics, though it would be overkill for a general-purpose processor,
    which is why the vector units in mainstream CPUs are 128 or 256 bits
    wide at most.

    The new vector unit also supports three-operand instructions, probably
    including the classic "A * B + C" operation that is so common in many
    applications, including graphics. With three operands and two
    calculations per instruction, the peak throughput of a single Larrabee
    core should be 32 operations per cycle, and that's just what the paper

    I say "probably" because the Siggraph paper doesn't describe exactly
    what operations will be implemented in the vector unit, but I suspect
    this part of the Larrabee design is related to Intel's Advanced Vector
    Extensions, announced last April. The first implementations of AVX for
    desktop CPUs will apparently begin with a 256-bit design, another
    indication of how unusual it is for Larrabee to have a 512-bit vector

    The multithreading factor
    Intel also built four-way multithreading into the Larrabee cores. Each
    Larrabee core can save all the register data from four separate
    threads in hardware, so that most thread-switch operations can be
    performed almost instantly rather than having to save one set of
    registers to main memory and load another. This approach is a
    reasonable compromise for reducing thread-switching overhead, although
    it probably consumes a significant amount of silicon.

    Note that this kind of multithreading in Larrabee is very different
    from the Hyper-Threading technology Intel uses on Pentium 4, Atom, and
    future Nehalem processors. Hyper-Threading (aka simultaneous multi-
    threading) allows multiple threads to execute simultaneously on a
    single core, but this only makes sense when there are many execution
    units in the core. Larrabee's two execution units are not enough to
    share this way.

    All of these differences prove rather conclusively that Larrabee's
    cores are not the same as the cores in Intel's Atom processors (also
    known as Silverthorne). That surprised me; the Atom core seemed fairly
    appropriate for the Larrabee project. All that really should have been
    necessary was to graft a wider vector unit onto the Atom design. But
    now I suppose the Atom and Larrabee projects have been completely
    independent from one another all along.

    Intel won't say how many cores are in the first chip. The paper
    describes an on-chip ring network that connects the cores. The network
    is 512 bits wide. Interestingly, the paper mentions that there are two
    different ring designs--one for Larrabee chips with up to 16 cores,
    and one for larger chips. That suggests Intel has chips planned with
    relatively small numbers of cores, possibly as few as four or eight.
    Such small implementations might be appropriate for Intel's future
    integrated-graphics chip sets, but as such they will be very slow by
    comparison with contemporary discrete GPUs, just as Intel's current
    products are.

    Larrabee provides some graphics-specific logic in addition to the CPU
    cores, but not much. The paper says that many tasks traditionally
    performed by fixed-function circuits, such as rasterization and
    blending, are performed in software on Larrabee. This is likely to be
    a disadvantage for Larrabee, since a software solution will inevitably
    consume more power than optimized logic--and consume computing
    resources that could have been used for other purposes. I suspect this
    was a time-to-market decision: tape out first, write software later.

    The paper says Larrabee does provide fixed-function logic for texture
    filtering because filtering requires steps that don't fit as well into
    a CPU core. I presume there's other fixed-function logic in Larrabee,
    but the paper doesn't say.

    Larrabee's rendering code uses binning, a technique that has been used
    in many software and hardware 3D solutions over the years, sometimes
    under names such as "tiling" and "chunking." Binning divides the
    screen into regions and identifies which polygons will appear in each
    region, then renders each region separately. It's a sensible choice
    for Larrabee, since each region can be assigned to a separate core.

    Binning also reduces memory bandwidth, since it's easier for each core
    to keep track of the lower number of polygons assigned to it. The
    cores are less likely to need to go out to main memory for additional

    The numbers crunch
    The paper gives some performance numbers, but they're hard to
    interpret. For example, game benchmarks were constructed by running a
    scene through a game, then taking only widely separated frames for
    testing on the Intel design. In the F.E.A.R. game, for example, only
    every 100th frame was used in the tests. This creates an unusually
    difficult situation for Larrabee; there's likely to be much less reuse
    of information from one frame to the next.

    But given that limitation of the test procedure, the results don't
    look very good. To render F.E.A.R. at 60 frames per second--a common
    definition of good-enough gaming performance--required from 7 to 25
    cores, assuming each was running at 1GHz. Although there's a range
    here depending on the complexity of each frame, good gameplay requires
    maintaining a high frame rate--so it's possible that F.E.A.R. would,
    in practice, require at least a 16-core Larrabee processor.

    And that's about the performance of a 2006-vintage Nvidia or Advanced
    Micro Devices/ATI graphics chip. This year's chips are three to four
    times as fast.

    In other words, unless Intel is prepared to make big, hot Larrabee
    chips, I don't think it's going to be competitive with today's best
    graphics chips on games.

    Intel can certainly do that-- no other semiconductor company on Earth
    can afford to make big chips the way Intel can-- but that would ruin
    Intel's gross margins, which are how Wall Street judges the company.
    Also, Intel's newest processor fabs are optimized for high-performance
    logic, like that used in Core 2 processors. Larrabee runs more slowly,
    suggesting it could be economically manufactured on ASIC product
    lines... but Intel's ASIC lines are all relatively old, refitted CPU

    Nvidia, by comparison, gets around this problem by designing its chips
    from the beginning to be made in modern ASIC factories, chiefly those
    run by TSMC. Although these factories are a generation behind Intel's
    in process technology, they're much less expensive to operate. So this
    may be a situation where Intel's process edge doesn't mean as much as
    it does in the CPU business.

    The Larrabee programming model also supports nongraphics applications.
    Since it's fundamentally just a multicore x86 processor, it can do
    anything a regular CPU can do. Intel's paper even uses Sun
    Microsystems' term, Throughput Computing, for multicore processing.

    The Larrabee cores aren't nearly as powerful as ordinary notebook or
    desktop processors for most applications. Real Larrabee chips could be
    faster or slower than the 1GHz reference frequency used in the paper,
    but there's definitely only one execution unit for the scalar
    operations that make up the bulk of operating-system and office
    software. That means a single Larrabee core would feel slow even when
    compared with a Pentium III processor at the same frequency, never
    mind a Core 2 Duo.

    But with such a strong vector unit, a Larrabee core could be very good
    at video encoding and other tasks, especially those that use floating-
    point math. At 1GHz, a single Larrabee core hits a theoretical 32
    GFLOPS (32 billion floating-point operations per second). A 32-core
    Larrabee chip could exceed a teraflop--roughly the performance of
    Nvidia's latest GPU, the GTX 280, which has 240 (very simple) cores.

    But I don't expect to see that kind of performance from the first
    Larrabee chips. The power consumption of a 32-core design with all the
    extra overhead required by x86 processing would be very high. Even
    with Intel's advantages in process technology, such a large Larrabee
    chip would probably be commercially impractical. Smaller Larrabee
    designs may find some niche applications, however, acting as number-
    crunching coprocessors much as IBM's Cell chips do in some systems.

    And although a Larrabee chip could, in principle, be exposed to
    Windows or Mac OS X to act as a collection of additional CPU cores,
    that wouldn't work very well in the real world and Intel has no
    intention of using it that way. Instead, Larrabee will be used like a
    coprocessor. In that application, Larrabee's x86 compatibility isn't
    worth very much.

    The bottom line
    So...what's Larrabee good for, and why did Intel bother with it?

    I think maybe this was a science project that got out of hand. It came
    along just as AMD was buying ATI and so positioning itself as a leader
    in CPU-GPU integration. Intel had (and still has) no competitive GPU
    technology, but perhaps it saw Larrabee as a way to blur the line
    distinguishing CPUs from GPUs, allowing Intel to leverage its
    expertise in CPU design into the GPU space as well.

    Intel may have paid too much attention to some of its own researchers,
    who have been touting ray tracing as a potential alternative to
    traditional polygon-order ray tracing. I wrote about this in some
    depth back in June ("Ray tracing for PCs--a bad idea whose time has
    come"). But ray tracing merits just one paragraph and one figure in
    this paper, which establish merely that Larrabee is more efficient at
    ray tracing than an ordinary Xeon server processor. It falls well
    short of establishing that ray tracing is a viable option on Larrabee,

    Future members of the Larrabee family may be good GPUs, but from what
    I can see in this paper, the first Larrabee products will be too slow,
    too expensive, and too hot to be commercially competitive. It may be
    several more years beyond the expected 2009/2010 debut of the first
    Larrabee parts before we find out just how much of Intel's CPU know-
    how is transferable to the GPU market.

    I'll be at Siggraph again this year, and I'll have more to say after
    I've read this paper through a few more times and had a chance to
    speak with some of the folks I know at AMD, Nvidia, and other
    companies in the graphics market.
    NV55, Aug 10, 2008
  5. NV55

    Jure Sah Guest

    NV55 pravi:
    As I recall something very similar was said about the VIA Chrome9. For
    the record that chip can't even do 2D rendering decently.
    Very different? The motherboard based GPUs are not really as dependent
    on other hardware as you make it sound, the only difference between it
    and a classic GPU is that it uses the chipset to do memory access just
    as much as any Intel CPU that you consider independent (a fact at the
    core of it's poor performance).
    As Intel marketing sees it, you mean? "The best attributes of both" is
    such a pile of crap, GPUs are made the way they are for a very good
    reason: they work graphics faster that way. A text obviously written for
    a community not well versed in the inner workings of a GPU.

    A GPU is not simply your standard serial CPU with a few special
    functions slapped on, nor is parallelism any of the particularly notable
    features of it's architecture. A GPU, preprogrammed with data that could
    be interpreted as an instruction set, is capable of executing the
    selection of commands on a data block simultaneously -- not like a
    multicore CPU which can perform two or more operations only on unrelated
    bits of data, the GPU preforms all of the selected instructions on the
    data block it is working with at once, in one clock. This makes it an
    extremely powerful tool for performing the same specific set of
    instructions on a large amount of data, a task common in graphics
    manipulation. This also makes it inherently incompatible with the x86
    architecture and can only be programmed in a similar manner with the use
    of a complex compiler, which interprets the programmer's code and
    organizes it into blocks of simultaneous operations to the best of it's
    ability; this is a typically inefficient process that surely cannot be
    done very well on the fly.

    If Intel is not using this advantage, and they're not, their magic new
    core will not have any of the advantages of the GPU.
    Yes indeed. It turns out it's not a GPU -- It's a Pentium.
    Yes multicore computing is a buzzword these days. Seems like the old
    days of the 3.8 GHz Penitum or the 2 MB of L2 cache of the Pentium 2 all
    over again. So long as it's big numbers, everybody's buying it,
    regardless of whether it actually helps or hurts performance.
    And most of all Intel has crappy GPUs. There are other graphics chip
    providers on the market today, but their market share isn't much for a
    very simple reason: their technology sucks. As far as GPUs are
    concerned, Intel is one of them.
    Which in other words means, since their "GPU" is actually a simple
    Pentium CPU, they're pretending to beat nVidia's advanced compiler
    software with their much more trivial serial x86 compiler (see
    description of GPU arch above to understand what I mean).

    Frankly I can't believe they're willing to market even their sheer
    laziness as a feature. But hey, it's Intel.
    Oh yes, the wonders of advanced technology: SOFTWARE ACCELERATION.
    According to them, their emulated hardware will outperform the real
    hardware in use by nVidia and ATI. Silly how *they* never figured to do
    something like that, until Intel came up with it, don't you think?
    Oh my! 16 pixels!

    Remember guys, this is 16 pixels at once trough *one* operation that may
    not be such where one pixel is affected by a neighbouring pixel or you
    break the pipeline. In x86. Any people who speak x86 Assembly around
    here probably know just how crappy this is.

    My apologies but this crap is too much fun not to maliciously comment on.

    Version: GnuPG v1.4.6 (GNU/Linux)
    Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

    -----END PGP SIGNATURE-----
    Jure Sah, Aug 13, 2008
  6. NV55

    Jure Sah Guest


    Jure Sah pravi:

    Version: GnuPG v1.4.6 (GNU/Linux)
    Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

    -----END PGP SIGNATURE-----
    Jure Sah, Aug 17, 2008
  7. NV55

    Yousuf Khan Guest

    It's starting to look like Larrabee is going to be a transitional
    technology, a throwaway quickly released-and-forgotten technology until
    their real technology comes out. It may have even been a marketing ploy
    to take attention away from AMD and to a lesser extent Nvidia.

    For all of this hype about Larrabee, they're saying that it won't even
    be a part of the forthcoming Nehelam integrated GPU. The Nehelam
    integrated GPU will be based on Intel's existing crappy GPUs. (The
    Nehelam integrated GPU will have a different codename, which I can't
    remember right now, but it's based on Nehelam anyways.)
    Actually the reason other GPU makers' marketshare isn't as much as
    Intel's is because Intel includes their GPUs with every chipset they
    sell on a motherboard. The vast majority of the Intel processors are
    paired with Intel chipsets, so the GPUs get high marketshare just
    hitching along for the ride.
    There will be real x86/GPU instruction set integration with AMD, when
    they get their Fusion processor running. SSE5 is being designed for
    GPU-based acceleration.
    It's always fun to poke holes in marketing.

    Yousuf Khan
    Yousuf Khan, Aug 18, 2008
  8. NV55

    Jure Sah Guest

    Hash: SHA1

    Yousuf Khan pravi:
    Sure, but there is a big difference between running all GPU functions
    software emulated in an x86 Pentium chip and implementing a selected few
    CPU functions with the GPU with an x86 frontend.

    Implementing SSE-like instructions on a GPU chip (or anywhere outside a
    CPU) makes sense, it also goes along with for example AMD's 3DNow!
    approach, where new, faster functions are provided to existing software
    by giving them an x87 frontend.

    It's not like x86 is an universally superior architecture, it's good for
    some things and bad for others, you have to know when to use it... but
    you could also say Intel's approach of creating completely new
    instruction sets every now and then is good for marketing and bad for
    Version: GnuPG v1.4.6 (GNU/Linux)
    Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

    -----END PGP SIGNATURE-----
    Jure Sah, Aug 22, 2008
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.