1. This forum section is a read-only archive which contains old newsgroup posts. If you wish to post a query, please do so in one of our main forum sections (here). This way you will get a faster, better response from the members on Motherboard Point.

Revealing The Power of DirectX 11

Discussion in 'ATI' started by NV55, Jan 31, 2009.

  1. NV55

    NV55 Guest

    No matter whether we've got a low end or high end system, we all
    expect the realtime 3D revolution to continue until we achieve near
    parity with reality. The push forward is backed by many factors
    including pure hardware performance and brilliant advances in
    techniques for better approximating what we see. But there's another
    side to the equation beyond just hardware and developers: there is the
    graphics API.

    Unlike CPUs, graphics hardware (GPUs) do not have a common instruction
    set upon which tools and software can be built. In order to get power
    of the hardware out to the public, we need a common interface that
    works no matter what GPU is underneath. It's left to the graphics
    hardware designer to take the code generated by this application
    programming interface (API) and translate it into something that their
    chip can use. Because it's the developer's single point of contact,
    the graphics API is incredibly important. It defines how much
    flexibility programmers have in using hardware and shapes the world of
    high performance realtime 3D graphics.

    Some of the key work done through graphics API is taking descriptions
    of 3D objects in a 3D world, sending those objects and other resources
    to the hardware, and then telling the hardware what to do with them.
    There is sort of a step by step process that needs to be followed that
    we generally call a pipeline. Graphics API pipelines have different
    stages where different work is done. Here's the general structure of a
    3D graphics pipeline:

    First vertex data (information about the position of the corners of
    shapes) is taken in and processed. Then those shapes can then be
    further manipulated and re-processed if needed. After this, 3D objects
    are broken down from 3D shapes by projecting them into 2D fragments
    called pixels (this step is called rasterization), and then these
    pixels are each processed by looking up texture information and using
    lighting techniques and so on. When pixels are finished processing,
    they are output and displayed on the screen. And that's the mile high
    overview of how 3D graphics work.

    For the past dozen years (it seems longer doesn't it?), we've seen
    makers of 3D graphics hardware accelerate two very prominent APIs:
    OpenGL and DirectX.

    We recently touched on advancements tangential to OpenGL in our OpenCL
    article, but today our focus will be on DirectX. Microsoft's DirectX
    graphics API is much more heavily used in game engines than OpenGL in
    good part because DirectX tends to move much more quickly and sets the
    bar for both hardware and OpenGL in terms of feature set and
    flexibility. Which always makes upcoming versions of DirectX exciting
    to talk about: they define the future capabilities of hardware and
    expose improved tools to developers. Upcoming DirectX versions are
    glimpses into our graphical future. Currently we have a lot of DirectX
    9 and DirectX 10 games available and in development, but DirectX 11
    looms on the horizon.

    As usual, Microsoft will be trying to time the release of their next
    DirectX revision with the release of compatible graphics hardware. As
    with last time, DirectX 11 will also be released with Windows 7. With
    the Windows 7 Beta already under way, we expect the OS to be done some
    time this year.

    Microsoft has been rather aggressive with Windows 7 scheduling in
    light of the rejection of Vista, so it appears they are stepping up to
    the plate to get everything out sooner rather than later. There was a
    little more than 4 years between the release of DirectX 9 and DirectX
    10. As it hit the streets with Vista in January of 2007, DirectX 10
    has just turned 2 and we are already anticipating it's replacement in
    the very near future. As we will learn, this speedy transition should
    be very good for DirectX 11 adoption as DirectX 10 hasn't even become
    pervasive yet: many games are still DirectX 9 only.

    But let's take a closer look at what we are talking about before we go
    any further.

    Introducing DirectX 11: The Pipeline and Features

    This is DirectX 10.

    We all remember him from our G80 launch article back in the day when
    no one knew how much Vista would really suck. Some of the short falls
    of DirectX 10 have been in operating system support, driver support,
    time to market issues, and other unfortunate roadblocks that kept
    developers from making full use of all the cool new features and tools
    DirectX 10 brought.

    Meet DirectX 11.

    She's much cooler than her older brother, and way hotter too. Many
    under-the-hood enhancements mean higher performance for features
    available but less used under DX10. The major changes to the pipeline
    mark revolutionary steps in graphics hardware and software
    capabilities. Tessellation (made up of the hull shader, tessellator
    and domain shader) and the Compute Shader are major developments that
    could go far in assisting developers in closing the gap between
    reality and unreality. These features have gotten a lot of press
    already, but we feel the key to DirectX 11 adoption (and thus
    exploitation) is in some of the more subtle elements. But we'll get in
    to all that in due time.

    Along with the pipeline changes, we see a whole host of new tweaks and
    adjustments. DirectX 11 is actually a strict superset of DirectX 10.1,
    meaning that all of those features are completely encapsulated in and
    unchanged by DirectX 11. This simple fact means that all DX11 hardware
    will include the changes required to be DX 10.1 compliant (which only
    AMD can claim at the moment). In addition to these tweaks, we also see
    these further extensions:


    While changes in the pipeline allow developers to write programs to
    accomplish different types of tasks, these more subtle changes allow
    those programs to be more complex, higher quality, and/or more
    performant. Beyond all this, Microsoft has also gone out of its way to
    help make parallel programming a little bit easier for game

    From Evolution to Expansion and Multi-Threading: The Mile High

    The November DirectX SDK update was the first to include some DirectX
    11 features for developers to try out. Of course, there is no DX11
    hardware yet, but what is included will run on the current DX10 setup
    with DX10 hardware under Vista and the beta Windows 7. This combined
    with the fact that Khronos finished the OpenCL specification last
    month mark two major developments on the path to more general purpose
    computing on the GPU. Of course, DX11 is more geared toward realtime
    3D and OpenCL is targeted at real general purpose data parallel
    programming (across multiple CPUs and GPUs) distinct from graphics,
    but these two programming APIs are major milestones in the future
    history of computing.

    There is more than just the compute shader included in DX11, and since
    our first real briefing about it at this year's NVISION, we've had the
    chance to do a little more research, reading slides and listening to
    presentations from SIGGRAPH and GameFest 2008 (from which we've
    included slides to help illustrate this article). The most interesting
    things to us are more subtle than just the inclusion of a tessellator
    or the addition of the Compute Shader. And the introduction of DX11
    will also bring benefits to owners of current DX10 and DX10.1
    hardware, provided AMD and NVIDIA keep up with appropriate driver
    support anyway.

    Many of the new aspects of DirectX 11 seem to indicate to us that the
    landscape is ripe for a fairly quick adoption especially if Microsoft
    brings Windows 7 out sooner rather than later. There have been
    adjustments to HLSL that should make it much more attractive to
    developers, the fact that DX10 is a subset of DX11 has some good
    transitional implications, and changes that make parallel programming
    much easier should all go a long way to helping developers pick up the
    API quickly. DirectX 11 will be available for Vista, so there won't be
    as many complications from a lack of users upgrading, and Windows 7
    may also inspire Windows XP gamers to upgrade meaning a larger install
    base for developers to target as well.

    The bottom line is that while DirectX 10 promised features that could
    bring a revolution in visual fidelity and rendering techniques,
    DirectX 11 may actually deliver the goods while helping developers
    make the API transition faster than we've seen in the past. We might
    not see techniques that take advantage of the exclusive DirectX 11
    features right off the bat, but adoption of the new version of the API
    itself will go a long way to inspiring amazing advances in realtime 3D

    From DirectX 6 through DirectX 9, Microsoft steadily evolved their
    graphics programming API from a fixed function vehicle for setting
    state and moving data structures around to a rich, programmable
    environment enabling deep control of graphics hardware. The step from
    DX9 to DX10 was the final break in the old ways, opening up and
    expanding on the programmability to DX9 to add more depth and
    flexibility enabled by newer hardware. Microsoft also forced a shift
    in the driver model with the DX10 transition to leave the rest of the
    legacy behind and try and help increase stability and flexibility when
    using DX10 hardware. But DirectX 11 is different.

    Rather than throwing out old constructs in order to move towards more
    programmability, Microsoft has built DirectX 11 as a strict superset
    of DirectX 10/10.1, which enables some curious possibilities.
    Essentailly, DX10 code will be DX11 code that chooses not to implement
    some of the advanced features. On the flipside, DX11 will be able to
    run on down level hardware. Of course, all of the features of DX11
    will not be available, but it does mean that developers can stick with
    DX11 and target both DX10 and DX11 hardware without the need for two
    completely separate implementations: they're both the same but one
    targets a subset of functionality. Different code paths will be
    necessary if something DX11 only (like the tessellator or compute
    shader) is used, but this will still definitely be a benefit in
    transitioning to DX11 from DX10.


    Running on lower spec'd hardware will be important, and this could
    make the transition from DX10 to DX11 one of the fastest we have ever
    seen. In fact, with lethargic movement away from DX9 (both by
    developers and consumers), the rush to bring out Windows 7 and slow
    adoption of Vista, we could end up looking back at DX10 as merely a
    transitional API rather than the revolutionary paradigm shift it could
    have been. Of course, Microsoft continues to push that the fastest
    route to DX11 is to start developing DX10.1 code today. With DX11 as a
    superset of DX10, this is certainly true, but developer's time will
    very likely be better spent putting the bulk of their effort into a
    high quality DX9 path with minimal DX10 bells and whistles while
    saving the truly fundamental shifts in technique made possible by DX10
    for games targeted at DX11 hardware and timeframe.

    We are especially hopeful about a faster shift to DX11 because of the
    added advantages it will bring even to DX10 hardware. The major
    benefit I'm talking about here is multi-threading. Yes, eventually
    everything will need to be drawn rasterized and displayed (linearly
    and synchronously), but DX11 adds multi-threading support that allows
    applications to simultaneously create resources or manage state and
    issue draw commands all from an arbitrary number of threads. This may
    not significantly speed up the graphics subsystem (especially if we
    are already very GPU limited), but this does increase the ability to
    more easily explicitly massively thread a game and take advantage of
    the increasing number of CPU cores on the desktop.

    With 8 and 16 logical processor systems coming soon to a system near
    you, we need developers to push beyond the very coarse grained and
    heavy threads they are currently using that run well on two core
    systems. The cost/benefit of developing a game that is significantly
    assisted by the availability of more than 2 cores is very poor at this
    point. It is too difficult to extract enough parallelism to matter on
    quad core and beyond in most video games. But enabling simple parallel
    creation of resources and display lists by multiple threads could
    really open up opportunities for parallelizing game code that would
    otherwise have remained single threaded. Rather than one thread to
    handle all the DX state change and draw calls (or very well behaved
    and heavily synchronized threads sharing the responsibility),
    developers can more naturally create threads to manage types or groups
    of objects or parts of a world, opening up the path to the future
    where every object or entity can be managed by it's own thread (which
    would be necessary to extract performance when we eventually expand
    into hundreds of logical cores).

    The fact that Microsoft has planned multi-threading support for DX11
    games running on DX10 hardware is a major bonus. The only caveat here
    is that AMD and NVIDIA will need to do a little driver work for their
    existing DX10 hardware to make this work to its fullest extent (it
    will "work" but not as well even without a driver change). Of course,
    we expect that NVIDIA and especially AMD (as they are also a multi-
    core CPU company) will be very interested in making this happen. And,
    again, this provides major incentives for game developers to target
    DX11 even before DX11 hardware is widely available or deployed.

    All this is stacking up to make DX11 look like the goto technology.
    The additions to and expansions of DX10, the timing and the ability to
    run on down level hardware could create a perfect storm for a
    relatively quick uptake. By relatively quick, we are still looking at
    years for pervasive use of DX11, but we expect that the attractiveness
    of the new features and benefit to the existing install base will
    provide a bigger motivation for game developers to transition than
    we've seen before.

    If only Microsoft would (and could) back-port DX11 to Windows XP,
    there would be no reason for game developers to maintain legacy code
    paths. I know, I know, that'll never (and can't by design) happen.
    While we whole heartedly applaud the idea of imposing strict minimum
    requirements on hardware for a new operating system, unnecessarily
    cutting off an older OS at the knees is not the way to garner support.
    If Windows 7 ends up being a more expensive Vista in a shiny package,
    we may still have some pull towards DX9, especially for very
    mainstream or casual games that tend to lag a bit anyway (and as some
    readers have pointed out because consoles will still be DX9 for the
    next few years). It's in these incredibly simple but popular games and
    console games that the true value of amazing realtime 3D graphics
    could be brought to the general computing populous, but craptacular
    low end hardware and limiting API accessibility on popular operating
    systems further contribute to the retardation of graphics in the

    But that's the overview. Let's take some time to drill down a bit
    further into some of the technology.

    Drilling Down: DX11 And The Multi-Threaded Game Engine

    In spite of the fact that multithreaded programming has been around
    for decades, mainstream programmers didn't start focusing on parallel
    programming until multicore CPUs started coming along. Much general
    purpose code is straight forward as a single thread; extracting
    performance via parallel programming can be difficult and isn't always
    obvious. Even with talented programmers, Amdahls Law is a bitch: your
    speed up from parallelization is limited by the percent of code that
    is necessarily sequential.

    Currently, in game development, rendering is one of those
    "necessarily" sequential tasks. DirectX 10 isn't set up to
    appropriately handle multiple threads all throwing commands at the
    GPU. That doesn't mean parallelization of renders can't happen, but it
    does limit speed up because costly synchronization techniques or
    management threads need to be implemented in order to make sure
    nothing steps out of line. All this limits the benefit of
    parallelization and discourages programmers from trying too hard.
    After all, its a better idea to put more of your effort into areas
    where performance can be improved more significantly. John Carmack put
    it really well once, but I can't remember the quote. And I'm doing too
    much benchmarking to go look for it now. :p

    No matter what anyone does, some stuff in the renderer will need to be
    sequential. Programs, textures and resources must be loaded up,
    geometry happens before pixel processing, draw calls intended to be
    executed while a certain state is active must have that state set
    first and not changed until completion. Even in such a massively
    parallel machine, order must be maintained for many things. But order
    doesn't /always/ matter.

    Making more things thread-safe through an extended device interface
    using multiple contexts and making a lot of synchronization overhead
    the responsibility of the API and/or graphics driver, Microsoft has
    enabled game developers to more easily and effortlessly thread not
    only their rendering code, but their game code as well. These things
    will also work on DX10 hardware running on a system with DX11, though
    some missing hardware optimizations will reduce the performance
    benefit. But the fundamental ability to write code differently will go
    a long way to getting programmers more used to and better at
    parallelization. Let's take a look at the tools available to
    accomplish this in DX11.


    First up is free threaded asynchronous resource loading. That's a bit
    of a mouthful, but this feature gives developers the ability to upload
    programs, textures, state objects, and all resources in a thread-safe
    way and, if desired, concurrent with the rendering process. This
    doesn't mean that all this stuff will get pushed up in parallel with
    rendering, as the driver will manage what gets sent to the GPU and
    when based on priority, but it does mean the developer no longer has
    to think about synchronizing or manually prioritizing resource
    loading. Multiple threads can start loading whatever resources the
    need whenever they need them. The fact that this can also be done
    concurrently with rendering could improve performance for games that
    stream in data for massive open worlds in addition to enabling
    multithreaded opportunities.

    In order to enable this and other threading, the D3D device interface
    is now split into three separate interfaces: the Device, the Immediate
    Context, and the Deferred Context. Resource creation is done through
    the Device. The Immediate Context is the interface for setting device
    state, draw calls, and queries. There can only be one Device and one
    Immediate Context. The Deferred Context is another interface for state
    and draw calls, but many can exist in one program and can be used as
    the per-thread interface (Deferred Contexts themselves are thread
    unsafe though). Deferred Contexts and the free threaded resource
    creation through the device are where DX11 gets it multithreaded


    Multiple threads submit state and draw calls to their Deferred Context
    which complies a display list that is eventually executed by the
    Immediate Context. Games will still need a render thread, and this
    thread will use the Immediate Context to execute state and draw calls
    and to consume the display lists generated by Deferred Contexts. In
    this way, the ultimate destination of all state and draw calls is the
    Immediate Context, but fine grained synchronization is handled by the
    API and the display driver so that parallel threads can be better used
    to contribute to the rendering process. Some limitations on Deferred
    Contexts include the fact that they cannot query the device and they
    can't download or read back anything from the GPU. Deferred Contexts
    can, however, consume the display lists generated by other Deferred

    The end result of all this is that the future will be more parallel
    friendly. As two and four core CPUs become more and more popular and 8
    and 16 (logical) core CPUs are on the horizon we need all the help we
    can get when trying to extract performance from parallelism. This is a
    good move for DirectX and we hope it will help push game engines to
    more fully utilize more than 2 or even 4 cores when the time comes.

    Going Deeper: The DX11 Compute Shader and OpenCL/OpenGL

    Many developers are excited about the added flexibility of the Compute
    Shader (also referred to as the CS). This addition to the pipeline
    steps further from a render-centric API and enables more general
    purpose algorithms. We see added flexibility in both the type of
    operations that can be preformed on data and the type of data that can
    be operated on.

    In other pipeline stages, we see limitations imposed that are designed
    to speed up execution that get in the way of general purpose code.
    Although we can shoehorn general purpose algorithms into a pixel
    shader program, we don't have the freedom to use data structures like
    trees, sharing data between pixels (and thus threads) is difficult and
    costly, and we have to go through the motions of drawing triangles and
    mapping solutions onto this.

    Enter DirectX11 and the CS. Developers have the option to pass data
    structures over to the Compute Shader and run more general purpose
    algorithms on them. The Compute Shader, like the other fully
    programmable stages of the DX10 and DX11 pipeline, will share a single
    set of physical resources (shader processors).

    This hardware will need to be a little more flexible than it currently
    is as when it runs CS code it will have to support random reads and
    writes and irregular arrays (rather than simple streams or fixed size
    2D arrays), multiple outputs, direct invocation of individual or
    groups of threads as per the programmers needs, 32k of shared register
    space and thread group management, atomic instructions,
    synchronization constructs and the ability to perform unordered IO

    At the same time, the CS looses some features as well. As each thread
    is no longer treated as a pixel, so the association with geometry is
    lost (unless specifically passed in a data structure). This means
    that, although CS programs can still use texture samplers, automatic
    trilinear LOD calculations are not automatic (LOD must be specified).
    Additionally, depth culling, antialiasing, alphablending, and other
    operations that have no meaning to generic data cannot be performed
    inside a CS program.

    The type of new applications opened up by the CS are actually
    infinite. But the most immediate interest will come from game
    developers looking to augment their graphics engines with fancy
    techniques not possible in the Pixel Shader. Some of these
    applications include A-Buffer techniques to allow very high quality
    antialiasing and order independent transparency, more advanced
    deferred shading techniques, advanced post processing effects and
    convolution, FFTs (fourier transforms) for frequency domain
    operations, and summed area tables.


    Beyond the rendering specific applications, game developers may wish
    to do things like IK (inverse kinematics), physics, AI, and other
    traditionally CPU specific tasks on the GPU. Having this data on the
    GPU by performing calculations in the CS means that the data is more
    quickly available for use in rendering and some algorithms may be much
    faster on the GPU as well. It might even be an option to run things
    like AI or physics on both the GPU and the CPU if algorithms that
    always yield the same result on both types of processors can be found
    (which would essentially substitute compute power for bandwidth).

    Even though the code will run on the same hardware, PS and CS code
    will perform very differently based on the algorithms being
    implemented. One of the interesting things to look at is exposure and
    histogram data often used in HDR rendering. Calculating this data in
    the PS requires several passes and tricks to take all the pixels and
    either bin them or average them. Despite the fact that sharing data is
    going to slow things down quite a bit, sharing data can be much faster
    than running many passes and this makes the CS an ideal stage for such

    A while back we took a look at OpenCL, and we know that OpenCL will be
    able to share data structures with OpenGL. We haven't yet gotten a
    developers take on comparing OpenCL and the DX11 CS, but at first
    blush it seems that the possibilities opened up for game developers
    and graphics processing with DX11 and the Compute Shader will also be
    possible with OpenGL+OpenCL. Although the CS can be used as a general
    purpose hardware accelerated GPU computing interface, OpenCL is
    taregeted more at that arena and it's independence from Microsoft and
    DirectX will likely mean wider adoption as a GPU compute language for
    general purpose tasks.

    The use of OpenGL has declined significantly in the game developer
    community over the last five years. While OpenCL may enable DX11 like
    applications to be written in combination with OpenGL, it is more
    likely that this will be the venue of workstation applications like
    CAD/CAM and simulations that require visualization. While I'm a fan of
    OpenGL myself, I don't see the flexibility of OpenCL as a significant
    boon to its adoption in game engines.

    So What's a Tessellator?

    This has been covered before now in other articles about DirectX 11,
    but we first touched on the subject back with the R600 launch. Both
    R6xx and R7xx hardware have tessellators, but since these are
    proprietary implementations, they won't be directly compatible with
    DirectX 11 which uses a much more sophisticated setup. While neither
    AMD nor the DX11 tessellator itself is programmable, DX11 includes
    programmable input to and output of the tesselator (TS) through two
    additional pipeline stages called the Hull Shader (HS) and the Domain
    Shader (DS).


    The tessellator can take coarse shapes and break them up into smaller
    parts. It can also take these smaller parts and reshape them to form
    geometry that is much more complex and that more closely approximates
    reality. It can take a cube and turn it into a sphere with very little
    overhead and much fewer space requirements. Quality, performance and
    manageability benefit.


    The Hull Shader takes in patches and control points out outputs data
    on how to configure the tessellator. Patches are a new primitive (like
    vertices and pixels) that define a segment of a plane to be
    tessellated. Control points are used to define the parametric shape of
    the desired surface (like a curve or something). If you've ever used
    the pen tool in Photoshop, then you know what control points are:
    these just apply to surfaces (patches) instead of lines. The Hull
    Shader uses the control points to determine how to set up the
    tessellator and then passes them forward to the Domain Shader.


    The tessellator just tessellates: it breaks up patch fed to it by the
    Hull Shader based on the parameters set by the Hull shader per patch.
    It outputs a stream of points to the Domain Shader which then needs to
    finish up the process. While programmers must write HS programs for
    their code, there isn't any programming required for the TS. It's just
    a fixed function block that processes input based on parameters.


    The Domain Shader takes points generated by the tessellator and
    manipulates them to form the appropriate geometry based on control
    points and/or displacement maps. It performs this manipulation by
    running developer designed DS programs which can manipulate how the
    newly generated points are further shifted or displaced based on
    control points and textures. The Domain Shader, after processing a
    point, outputs a vertex. These vertices can be further processed by
    Geometry Shader which can also feed them back up to the Vertex Shader
    using stream out functionality. More likely than heading back up for a
    second pass, we will probably see most output of the Domain Shader
    head straight on to rasterization so that it's geometry can be broken
    down into screen space fragments for Pixel Shader processing.


    That covers what the basics of what the tesselator can do and how it
    does it. But do you find your self wondering: "self, can't the
    Geometry Shader just be used to create tessellated surfaces and move
    the resulting vertices around?" Well, you would be right. That is
    technically possible, but not practical at this point. Let's dive in
    to that a bit more.

    Tessellation: Because The GS Isn't Fast Enough

    Microsoft and AMD tend to get the most excited about tessellation when
    ever the topic of DX11 comes up. AMD jumped on the tessellation
    bandwagon long ago, and perhaps it does make sense for consoles like
    the XBox 360. Adding fixed function hardware to quickly and
    efficiently handle a task that improves memory footprint has major
    advantages in the living room. We still aren't sold on the need for a
    tessellator on the desktop, but who's to argue with progress.

    Or is it really progressive? The tessellator itself is fixed function
    rather than programmable. Sure, the input to and output of the
    tessellator can be manipulated a bit through the Hull Shader and
    Domain Shader, but the heart of the beast is just not that flexible.
    The Geometry Shader is the programmable block in the pipeline that is
    capable of tessellation as well much more, but it just doesn't have
    the power to do tessellation on any useful scale. So while most
    everything has been moving towards programmability in the rendering
    pipe, we have sort of a step backward here. But why?

    The argument between fixed function and programmable hardware is
    always one of performance versus flexibility and usefulness. In the
    beginning, fixed function was necessary to get the desired
    performance. As time went on, it became clear that adding in more
    fixed function hardware to graphics chips just wasn't feasible. The
    transistors put into specialized hardware just go unused if developers
    don't program to take advantage of it. This made a shift toward a
    architectures where expanding the pool of compute resources that could
    be shared and used for many different tasks became a much more
    attractive way to go. In the general case anyway. But that doesn't
    mean that fixed function hardware doesn't have it's place.

    We do still have the problem that all the transistors put into the
    tessellator are worthless unless developers take advantage of the
    hardware. But the reason it makes sense is that the ROI (return on
    investment: what you get for what you put in) on those transistors is
    huge if developers do take advantage of the hardware: it's much easier
    to get huge tessellation performance out of a fixed function
    tessellator than to put the necessary resources into the Geometry
    Shader to allow it to be capable of the same tessellation performance
    programmatically. This doesn't mean we'll start to see a renaissance
    of fixed function blocks in our graphics hardware, just that
    significantly advanced features going forward may still require the
    sacrifice of programability in favor of early adoption of a feature.
    The majority of tasks will continue to be enabled in a flexible
    programmable way, and in the future we may see more flexibility
    introduced into the tessellator until it becomes fully programmable as
    well (or ends up just being merged into some future version of the
    Geometry Shader).


    Now don't let this technical assessment of fixed function tessellation
    make you think we aren't interested in reaping the benefits of the
    tessellator. Currently, artists need to create different versions of
    their objects for different LODs (Level of Detail -- reducing or
    increasing complexity as the object moves further or nearer the
    viewer), and geometry simulation through texturing at each LOD needs
    to be done by pixel shaders. This requires extra work from both
    artists and programmers and costs a good bit in terms of performance.
    There are also some effects than can only be done with more geometry.


    Tessellation is a great way to get that geometry in there for more
    detail, shadowing, and smooth edges. High geometry also allows really
    cool displacement mapping effects. Currently, much geometry is
    simulated through textures and techniques like bump mapping or
    parallax occlusion mapping or some other technique. Even with high
    geometry, we will want to have large normal maps for our lighting
    algorithms to use, but we won't need to do so much work to make things
    like cracks, bumps, ridges, and small detail geometry appear to be
    there when it isn't because we can just tessellate and displace in a
    single pass through the pipeline. This is fast, efficient, and can
    produce very detailed effects while freeing up pixel shader resources
    for other uses. With tessellation, artists can create one sub division
    surface that can have a dynamic LOD free of charge with a simple hull
    shader and a displacement map applied in the domain shader will save a
    lot of work, increase quality and improve performance quite a bit.


    If developers adopt tessellation, we could see cool things, and with
    the move to DX11 class hardware both NVIDIA and AMD will be making
    parts with tessellation capability. But we may not see developers just
    start using tessellation (or the compute shader for that matter) right
    away. Because DirectX 11 will run on down level hardware and at the
    release of DX11 we will already have a huge number cards on the market
    capable of running a subset of DX11 bringing with it a better, more
    refined, programming language in the new version of HLSL and seamless
    parallelization optimizations, we will very likely see the first DX11
    games only implementing features that can run completely on DX10

    Of course, at that point developers can be fully confident of
    exploiting all the aspects of DX10 hardware, which they still aren't
    completely taking advantage of. Many people still want and need a DX9
    path because of Vista's failure, which means DX10 code tends to be
    more or less an enhanced DX9 path rather than something fundamentally
    different. So when DirectX 11 finally debuts, we will start to see
    what developers could really do with DX10.

    Certainly there will be developers experimenting with tessellation,
    but these will probably just be simple amplification to get rid of
    those jagged edges around curved surfaces at first. It will take time
    for the real advanced tessellation techniques everyone is excited
    about to come to fruition.

    One Last Thing and Closing Thoughts

    The final bit of DX11 we'll touch on is the update to HLSL (MS's High
    Level Shader Language) in version 5.0 which brings some very developer
    friendly adjustments. While HLSL has always been similar in syntax to
    C, 5.0 adds support for classes and interfaces. We still don't get to
    use pointers though.

    These changes are being made because of the sheer size of shader code.
    Programmers and artists need to build or generate either a single
    massive shader or tons of smaller shader programs for any given game.
    These code resources are huge and can be hard to manage without OOP
    (Object Oriented Programming) constructs. But there are some
    differences to how things work in other OOP languages. For instance,
    there is no need for memory management (because there are no pointers)
    or constructors / destructors in HLSL. Tasks like initialization are
    handled through updates to constant buffers, which generally reflect
    member data.


    Aside from the programmability aspect, classes and interfaces were
    added to support dynamic shader linkage to combat the intricacy of
    developing with huge numbers of resources and effects. Dynamic linking
    allows the application to decide at runtime what shaders to compile
    and link and enables interfaces to be left ambiguous until runtime. At
    runtime, shaders are dynamically linked and based on what is linked
    all possible function bodies are then compiled and optimized. Compiled
    hardware-native code isn't inlined until the appropriate SetShader
    function is called.


    The flexibility this provides will enable development of much more
    complex and dynamic shader code, as it won't all need to be in one
    giant block with lots of "if"s nor will there need to be thousands of
    smaller shaders cluttering up the developers mind. Performance of the
    shaders will still limit what can be done, but with this step DirectX
    helps reduce code complexity as a limiting factor in development.

    With all of this, the ability to perform unordered memory accesses,
    multi-threading, tessellation, and the Compute Shader, DX11 is pretty
    aggressive. The complexity of the upgrade, however, is mitigated by
    the fact that this is nothing like the wholesale changes made in the
    move from DX9 to DX10: DX11 is really just a superset of DX10 in terms
    of features. This enables the ability for DX11 to run on down-level
    hardware (where DX11 specific features are not used), which when
    combined with the enhancements to HLSL with OOP and dynamic shader
    linking mean that developers should really have fewer qualms about
    moving from DX10 to DX11 than we saw with the transition from DX9.

    To be fair, the OS upgrade requirement also threw a wrench in the
    gears. That won't be a problem this time, as Vista still sucks but
    will be getting DX11 support and Windows 7 looks like a better upgrade
    option for XP users than Vista. Developers who haven't already moved
    from DX9 may well skip DX10 altogether in favor of DX11 depending on
    the predicted ship dates of their titles, all signs point to DX11 as
    setting the time frame we start to see the revolution promised with
    the move to DX10 take place. Developers have had time to familiarize
    themselves with the extended advantages of programmability offered by
    DX10, coding for DX11 will be much easier though OOP constructs and
    multithreaded support, and if the features don't entice them, the
    ability to run on downlevel hardware with a better coding environment
    might just seal the deal.

    I'm still an OpenGL developer at this point, and I've dabbled a bit
    with DirectX at times. But DirectX 11 (and my disappointment with
    OpenGL 3.0) mark the first time I think I might actually make the
    switch. The first preview of DX11 is already available in the latest
    DX SDK.
    NV55, Jan 31, 2009
    1. Advertisements

  2. NV55

    rms Guest

    "NV55" <> wrote in message

    Good article, but I read it at the original source, which you didn't
    bother to include. Why not?

    rms, Feb 1, 2009
    1. Advertisements

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.
Similar Threads
  1. John Lidstrom
    Oct 3, 2003
  2. QZ
  3. Larry Roberts
    Jul 25, 2003
  4. Dan

    Directx 8 vs directx 9

    Dan, Jan 10, 2004, in forum: ATI
  5. Dan

    Directx 8 v directx 9

    Dan, Jan 10, 2004, in forum: Nvidia
    Sith Lord
    Jan 11, 2004
  6. Dark Avenger

    FW: Directx 8 v directx 9

    Dark Avenger, Jan 11, 2004, in forum: Nvidia
    Lee Hardy
    Jan 11, 2004
  7. Neil Harrington

    DirectX 10 card on DirectX 9?

    Neil Harrington, Mar 24, 2008, in forum: Nvidia
    Man-wai Chang ToDie
    Mar 25, 2008
  8. NV55