1. This forum section is a read-only archive which contains old newsgroup posts. If you wish to post a query, please do so in one of our main forum sections (here). This way you will get a faster, better response from the members on Motherboard Point.

ATI XENOS (X360 GPU) summary

Discussion in 'ATI' started by Guest, Jun 20, 2005.

  1. Guest

    Guest Guest

    Summary by GameMaster at Teamxbox forums
    http://forum.teamxbox.com/showthread.php?p=5465848&highlight=anisotropic#post5465848


    *XENOS is a "Split Processor" GPU, meaning that it is actually 2 GPU cores
    that is packaged together with the "Parent" GPU handling the majority of
    shader tasks and acting as the "North Bridge" for the system, among other
    things. The "Daughter" GPU is directly linked to the "Parent" GPU and this
    is the module that has the 10MB of eDRAM. There is a considerable amount of
    additional logic on the "Daughter" GPU that will process a number of things
    such as HDR, 4xMSAA/FSAA, Z Buffer (Depth), Alpha Buffer (Transparency),
    Stencil Buffer (Shadows), Occusion Culling (Removing unseen polygons),
    Radiosity Lighting (such as Global Illumination), Real Time LOD (Level of
    Detail/Tessellation), and something that ATI refers to as "Fluid Reality"
    which is basically material physics such as hair, clothing, and water. All
    of that without burdening the "Parent" GPU and saving memory bandwidth at
    the same time since these tasks can be performed on the eDRAM.

    *XENOS's Parent GPU has 232 million transistors and the Daughter GPU has 150
    million transistors (80 million is for the eDRAM), for a grand total of
    around 382 million transistors. XENOS's "Parent" GPU is manufactured by TMSC
    using their .09nm manufacturing process and the "Daughter" GPU is
    manufactured by NEC using their .09nm manufacturing process. The "Split
    Processor" design allows XENOS to improve yeild during manufacturing and
    also helps with heat output/power comsumption issues.

    *XENOS uses deferred tile based rendering (some of you would be familar with
    this as the Dreamcast used this rendering technique). This is how they will
    be able to process high resolution displays with 4xFSAA active and there are
    some additional performance enhancing technologies that will take advantage
    of the tile based rendering.

    *XENOS contains 16 texture fetch and 16 vertex fetch units. Each of the
    texture units have bilinear sampling capacity per clock and if trilinear or
    anisotropic filtering, each unit will loop itself through multiple samples
    so the target sampling and filtering level is complete (Basically this means
    there is less performance loss when you are using trilinear or anisotropic
    texture filtering). These are done OUTSIDE of the shader units and improves
    performance as this increases efficiency.

    *XENOS is capable of processing 64 threads simultaneously, this is to make
    sure that all elements are being utilized and so there is minimal or no
    stalling of the graphics architecture. So even if a ALU may be waiting for a
    texture sample to be achieved, that thread would not stall the ALU as it
    would be working on something else from another thread. This effectively
    hides tasks that would normally have a large latency penalty attached to
    them. ATI suggests that their testing achieves an average of 95% efficiency
    of the shader array in general purpose graphics usage conditions. The
    throughput is said to be two loops, two texture instructions, 6 ALU
    instructions, per pixel, per cycle at Xeno's peak fill rate.

    *XENOS has 48 ALUs that are 16-way, and are grouped into 3 arrays of SIMD
    ALUs. Each ALU can co-issue a Vector4 and a scalar instruction
    simultaneously, essentially a "5D" operation per cycle (basically 2 Vec4 and
    2 scalar instructions per cycle per ALU). The ALUs process everything in
    FP32 precision with no internal partial precision requirements for FP16.
    Additionally each of the 48 ALUs contains additional logic that performs all
    the pixel shader interpolation calculations. ATI suggests that this would
    basically equates to an extra 33% pixel shader computional capacity.

    *Developers can choose to allow XENOS to automatically handle load balancing
    of the ALUs for their applications or take direct control of the ALUs. The
    load balancing is based on a algorithm that affects prioritization of the
    vertex and pixel shader programs. ATI believes that the algorithm gives very
    optimal throughput and expect only a few developers to actually look into
    changing the weightings of the algorithm. They also state that there will
    never be an unused shader array or texture sampler if there are threads
    available to use it.

    *XENOS capabilities... 4K instruction slots (shared between VS and PS),
    greater than 500K maximum number of instructions executed, has instruction
    prediction, 64 temporary registers, 512 consant registers (shared between VS
    and PS), has static flow control, has dynamic flow control, had a 4 dynamic
    flow control depth or 2^23 if nesting, has vertex texture fetch (dependant
    fetches and all formats), 32 surface shared pool where textures consumes 1
    entry and vertex consumes 1/3 of a entry so maximum of 32 texture or 96
    vertex, has geometry instancing, has no dependant texture limits or texture
    instruction limits, has position registers, has 16 interpolated registers,
    has arbitrary swizzling, has gradient instructions, has loop count
    registers, and has face registers (2 sided lighting). What does all that
    mean? Don't ask... it would take too long to describe everything, but all
    this does mean it EXCEEDS VS3.0 and PS3.0 specifications.

    *XENOS has a something called "MEMEXPORT" which will be important for shader
    programs that exceed 4000 instructions, but that is only the start of this
    particular beauty. It would take me too long to describe this feature in
    this post, but developers will absolutely love this feature...

    *XENOS is capable of processing a displacement map in a single pass (this
    basically gives free additional geometry for the object).

    ....and a lot more than the 10 item limit that was requested by the earlier
    poster. Bottom line, XENOS is both POWERFUL and EFFICIENT... now what
    happens when you combine something that is powerful AND efficient? More
    later...
     
    Guest, Jun 20, 2005
    #1
    1. Advertisements

  2. Guest

    Dennis G. Guest

    Is it compatible with x64 systems? Drivers and all?

    Dennis
     
    Dennis G., Jun 20, 2005
    #2
    1. Advertisements

  3. Guest

    Xen0s* Guest

    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.