Xbox LIVE Indie Games
Sort Discussions: Previous Discussion Next Discussion
Page 1 of 1 (6 posts)

Which primitive type SMART XNA dev would use after all :)

Last post 11/29/2009 8:17 PM by Sergiusz. 5 replies.
  • 11/29/2009 12:28 AM

    Which primitive type SMART XNA dev would use after all :)

    Hi guys, I'm working on custom geometry content format to be used in my project. First of all I want it to be used for terrain component, which is basically going to be yet another chunked LOD / geoclipmapping / whatever implementation. Then I want it to be used for other geometry needs either static od skinned. So there is question: after reading every possible paper google could find on this :) and digging through forums here I'd like to know:

    what's most efficient vertex topology scheme to use so that it take full advantage of modern GPUs, like vertex cache and to render geometry with less possible draw calls count?

    So far it looks to me that the obvious choice here is: TRIANGLE LIST.

    I red that it used to be triangle strips, but today it supposed to be not true any longer since this way you're not taking realy any advantage of vertex cache on modern GPU. Another bad thing about it is that while you can save bandwidth sending less vertex info, you could need more draw calls when drawing complex meshes in some cases.

    For my project I'd probably end up writing custom importer and then processor for geometry, so using triangle lists I could leverage a lot of usefull bits available already in XNA content pipeline, like MeshHelper methods, especially OptimizeForCache* and save myself a lot of work.

    So, which way to go guys?

    *) By the way, I red that OptimizeForCache wraps up a native Direct3D Hoppe's vertex optimization implementation and that it's a quite usefull piece of code :) Why it's not used everytime, for example in Generated Geometry sample? <- Shawn, you insight here would be greatly appreciated :)

    Thanks!
  • 11/29/2009 1:54 AM In reply to

    Re: Which primitive type SMART XNA dev would use after all :)

    There's always a balance to be found between performance and development time.

    I don't think triangle strips or fans are useful for terrain because the chunks or patches themselves are square grids, and you only want one draw call per patch/texture, which you can't do with strips. Plus implementing lod is easier with lists.

    Regarding the samples, I suspect they probably ignore performance details because they're not core to the sample subject, and would clutter the implementation.
  • 11/29/2009 5:15 AM In reply to

    Re: Which primitive type SMART XNA dev would use after all :)

    A triangle strip gives you one triangle per transformed vertex, at the limit. There's two to start up, and then one triangle generated per input vertex. You get even worse performance because you need to add stitching.

    A perfect transform cache for an indexed triangle list gives you two triangles per transformed vertex, at the limit. For a 10x10 quad patch (200 triangles, 121 vertices), it's easy to see that a perfect vertex cache will give you 1.65 triangles per transformed vertex.

    So perhaps you want to try an indexed triangle strip? The benefit is that you'll need to feed fewer indices into the set-up engine. The draw-back is that you still generate degenerate triangles for all the stitching edges. All-in-all, there's pretty much no difference in performance, so you might as well use the much nicer triangle list representation.

  • 11/29/2009 7:10 PM In reply to

    Re: Which primitive type SMART XNA dev would use after all :)

    Answer
    Reply Quote
    Assuming indexed geometry (which is essential if you want any kind of vertex cache reuse), both indexed lists and indexed strips end up with the same number of vertices, and both offer equal flexibility to rearrange the contents of the vertex buffer for maximum memory fetch cache efficiency. So there are really just two distinctions between the two:

    • How many indices must be sent to the GPU?
    • How much flexibility do you have to reorder triangles to maximize post vertex shader cache hits?

    Note: it's important to be aware that there are two types of vertex cache in the GPU: one between the memory subsystem and vertex shader inputs (which is affected by reordering vertices in the vertex buffer, and fixing up index values to match), then another between the vertex shader outputs and primitive assembler (which is affected by reordering triangles in the index buffer). Only that latter cache is affected here.

    Also note that I am ignoring the cost of degenerate triangles to link strips, because all GPUs have efficient trivial reject for such degenerates, and degenarates will always hit the post vertex shader cache, so their only real cost is in the primitive assembler stage, and it's pretty much unheard of for a modern GPU to be bottlenecked by primitive assembly. So for all practical purposes, the only cost of adding degenerate triangles is the extra indices.

    So to decide which will be more efficient, you need to look at how many indices are required, and how efficient your post vertex shader caching can be.

    Generally, if your geometry strips well, strips will require fewer indices than lists, but if it strips poorly, so that many degenarate links are required, the two can end up breaking roughly even.

    Byt when you look at cache efficiency, lists are almost always a win. Consider for instance a terrain grid, which is a very simple case for stripping and strips almost perfectly. Trouble is, if you have long strips running from one side of the terrain to the other, the vertices from one edge of each strip will be long gone from the cache by the time the next strip wants to reuse them. The most efficient rendering order for cache reuse on a 2D is a complex set of spirals, not at all a series of regular rows, but it is impossible to find an ordering that will both strip well and also maximize cache hits.

    Combined with the complexity of computing triangle strips compared to lists, I would go with lists for pretty much everything. Strips makes sense in a few very specific cases where you have geometry that naturally strips well and does not lend itself to post shader cache reuse, but such situations are incredibly rare.

  • 11/29/2009 7:10 PM In reply to

    Re: Which primitive type SMART XNA dev would use after all :)

    Sergiusz:
    I red that OptimizeForCache wraps up a native Direct3D Hoppe's vertex optimization implementation and that it's a quite usefull piece of code :) Why it's not used everytime, for example in Generated Geometry sample?


    Both processors used in the Generated Geometry sample end by chaining to the built-in ModelProcessor, which takes care of all the appropriate optimizations.
  • 11/29/2009 8:17 PM In reply to

    Re: Which primitive type SMART XNA dev would use after all :)

    Thank you guys very much for your comments: you helped me a lot and reassured that I took a right direction with my design.

Page 1 of 1 (6 posts) Previous Discussion Next Discussion