Xbox LIVE Indie Games
Sort Discussions: Previous Discussion Next Discussion
Page 1 of 1 (12 posts)

Writing SIMD Code With C# (but not XNA)

Last post 2/10/2011 7:14 PM by ShawMishrak. 11 replies.
  • 1/23/2009 1:57 AM

    Writing SIMD Code With C# (but not XNA)

    So, it appears that Mono has beat Microsoft .NET to the SIMD realm.  (A subset of) SSE intrinsics are now available for use on Mono 2.2 for x86 through C# and Mono.Simd, and are treated specially by the JIT compiler to generate optimized code, entirely in the managed realm.  At first I was skeptical, but after playing around with it for awhile, I have to say I'm impressed.  I'm anxious to see where this goes in future versions of Mono.  Of course, a picture is worth a thousand words...





    Of special interest is the purple line.  The hand-optimized C# matrix multiplication code using Mono.Simd is almost as fast as optimized C++ code using hand-written SSE intrinsics.  And even more fascinating is that the hand-optimized C# code performs significantly faster at larger problem sizes than sequential C++ code with full compiler optimizations.  Of course this is a synthetic benchmark, but there is a lot of potential here!

    And a post like this is worthless without code:

      // Excerpts from my test matrix structure. 
     
      [StructLayout(LayoutKind.Explicit)] 
      public struct SIMDMatrix 
      { 
        [FieldOffset(0)] 
        public float M11; 
        [FieldOffset(4)] 
        public float M12; 
        [FieldOffset(8)] 
        public float M13; 
        [FieldOffset(12)] 
        public float M14; 
        [FieldOffset(16)] 
        public float M21; 
        [FieldOffset(20)] 
        public float M22; 
        [FieldOffset(24)] 
        public float M23; 
        [FieldOffset(28)] 
        public float M24; 
        [FieldOffset(32)] 
        public float M31; 
        [FieldOffset(36)] 
        public float M32; 
        [FieldOffset(40)] 
        public float M33; 
        [FieldOffset(44)] 
        public float M34; 
        [FieldOffset(48)] 
        public float M41; 
        [FieldOffset(52)] 
        public float M42; 
        [FieldOffset(56)] 
        public float M43; 
        [FieldOffset(60)] 
        public float M44; 
     
        [FieldOffset(0)] 
        private Vector4f R0; 
        [FieldOffset(16)] 
        private Vector4f R1; 
        [FieldOffset(32)] 
        private Vector4f R2; 
        [FieldOffset(48)] 
        private Vector4f R3; 
     
        public static void MultiplySIMD(ref SIMDMatrix matrix1, ref SIMDMatrix matrix2, ref SIMDMatrix result) 
        { 
          // Vector4f is a Mono.Simd type. 
          Vector4f t1, t2; 
          Vector4f out0, out1, out2, out3; 
     
          t1 = (Vector4f.Shuffle(matrix1.R0, ShuffleSel.ExpandX) * matrix2.R0) + (Vector4f.Shuffle(matrix1.R0, ShuffleSel.ExpandY) * matrix2.R1); 
          t2 = (Vector4f.Shuffle(matrix1.R0, ShuffleSel.ExpandZ) * matrix2.R2) + (Vector4f.Shuffle(matrix1.R0, ShuffleSel.ExpandW) * matrix2.R3); 
          out0 = t1 + t2; 
     
          t1 = (Vector4f.Shuffle(matrix1.R1, ShuffleSel.ExpandX) * matrix2.R0) + (Vector4f.Shuffle(matrix1.R1, ShuffleSel.ExpandY) * matrix2.R1); 
          t2 = (Vector4f.Shuffle(matrix1.R1, ShuffleSel.ExpandZ) * matrix2.R2) + (Vector4f.Shuffle(matrix1.R1, ShuffleSel.ExpandW) * matrix2.R3); 
          out1 = t1 + t2; 
     
          t1 = (Vector4f.Shuffle(matrix1.R2, ShuffleSel.ExpandX) * matrix2.R0) + (Vector4f.Shuffle(matrix1.R2, ShuffleSel.ExpandY) * matrix2.R1); 
          t2 = (Vector4f.Shuffle(matrix1.R2, ShuffleSel.ExpandZ) * matrix2.R2) + (Vector4f.Shuffle(matrix1.R2, ShuffleSel.ExpandW) * matrix2.R3); 
          out2 = t1 + t2; 
     
          t1 = (Vector4f.Shuffle(matrix1.R3, ShuffleSel.ExpandX) * matrix2.R0) + (Vector4f.Shuffle(matrix1.R3, ShuffleSel.ExpandY) * matrix2.R1); 
          t2 = (Vector4f.Shuffle(matrix1.R3, ShuffleSel.ExpandZ) * matrix2.R2) + (Vector4f.Shuffle(matrix1.R3, ShuffleSel.ExpandW) * matrix2.R3); 
          out3 = t1 + t2; 
     
          result.R0 = out0; 
          result.R1 = out1; 
          result.R2 = out2; 
          result.R3 = out3; 
        } 
     
        // Additional code clipped 
      }   



    So, maybe now the CLR developers at Microsoft will have more motivation for getting SIMD functionality into .NET, especially on Xbox.  After all, they can't let Mono get too far ahead.  :)

  • 1/23/2009 2:11 AM In reply to

    Re: Writing SIMD Code With C# (but not XNA)

    ShawMishrak:
    So, maybe now the CLR developers at Microsoft will have more motivation for getting SIMD functionality into .NET, especially on Xbox.  After all, they can't let Mono get too far ahead.  :)


    While I would be one of the first to welcome such support, esp. on the 360, I have doubts that it will ever come to the 360 as the target user base is rather small to warrant investments into AltiVec support. And where's the link to the connect issue to cast my vote onto? :)
  • 1/23/2009 10:11 PM In reply to

    Re: Writing SIMD Code With C# (but not XNA)

    Thanks for sharing this, Shaw.  It's interesting to see the attributes and data structures necessary to take advantage of Mono's SIMD intrinsics.

    Of course, we're quite aware of the gains to be had by supporting the Xbox 360's AltiVec/VMX128 ISA with enhanced floating-point and SIMD instructions.  Right now, however, the XNA runtime on Xbox 360 depends on the .NET Compact Framework, whose JIT was designed to generate code for over a half-dozen architectures, most of them lacking floating-point hardware.

    This is a request near to my heart, so we'll keep it in mind as we look to improve performance in future releases.

    James
  • 1/23/2009 10:37 PM In reply to

    Re: Writing SIMD Code With C# (but not XNA)

    jamesreggio:
    Right now, however, the XNA runtime on Xbox 360 depends on the .NET Compact Framework, whose JIT was designed to generate code for over a half-dozen architectures, most of them lacking floating-point hardware.


    Yup, we're using a VM designed for mobile phones... ;)

    I think the VM needs to support decent inlining and instruction scheduling before even thinking about SIMD, though that would be nice to see as well. I know for a fact that VMX would bring matrix/vector maths performance up by about 14x on the Xbox, back to roughly the performance you might expect on a desktop PC.

    I'm not holding my breath though - after all, there are dev kits and C++ compilers available if performance really matters.
  • 7/4/2010 10:13 AM In reply to

    Re: Writing SIMD Code With C# (but not XNA)

    Wouldn't it be theoretically possible to design the CLR so that you basically tick the feature set you want and then deploy that together with the compact framework?

    With the compact CLR (if that's what it's called) targeting mobile devices or devices that lack floating point support it surprises me that it was considered a better fit for XBOX360 development. The compact framework is just a class library, that stuff we can write ourselves (though, it be nice to have a set of well designed foundation classes). But in terms of CLR/JIT capabilities and performance we can't do anything about that.

    A CLR run-time which targets gaming should definitely support the entire performance feature set including SIMD operations.

    Also since the CLR is basically an abstraction layer, it should be able to swing both ways in terms of SSE2 or AltVec implementations depending on the CPU architecture. The compact CLR isn't running my XNA game targeting Windows is it?

    Please vote for this item on Microsoft connect website:
    https://connect.microsoft.com/VisualStudio/feedback/details/573062/system-hardwareaccelerated
  • 7/4/2010 3:34 PM In reply to

    Re: Writing SIMD Code With C# (but not XNA)

    leidegre:
    The compact framework is just a class library


    Not quite. It's also a complex runtime environment that includes things like assembly loader, IL parser, JIT, GC...
  • 7/4/2010 9:24 PM In reply to

    Re: Writing SIMD Code With C# (but not XNA)

    As far as i am aware the XNAMath library is SIMD instruction aware (either SSE2 or AltiVec) as this is stated in several DX documentation files. In D3D11 the D3DXMath library doesn't even exist anymore and is replaced with the XNAMath lib.
  • 7/4/2010 11:16 PM In reply to

    Re: Writing SIMD Code With C# (but not XNA)

    Yes, but irony of ironies, the XNA Math library (http://msdn.microsoft.com/en-us/library/ee418725(VS.85).aspx) is a C++ library for use with DirectX and the XDK that seems to be entirely unconnected to XNA Game Studio and managed code (C#). I'm hopeful they named it that because they're planning an underlying switch of the XNA Framework's matrix and vector structs to it (thus enabling SIMD) but I haven't seen word one indicating that so if that is in the works, it's something that they probably haven't announced yet.
  • 7/4/2010 11:42 PM In reply to

    Re: Writing SIMD Code With C# (but not XNA)

    Actually its named that way becuase of history not becuase of a future release...

    Remember how XNA originall yapplied to ALL Microsoft gaming technologies and XNA Game Studio was a product under that brand? XNAMath was created during that time so the math library was called that. It has nothing to do with XNA Game Studio.
  • 2/2/2011 2:34 PM In reply to

    Re: Writing SIMD Code With C# (but not XNA)

    While Mono is still improving their SIMD API, and two years later, any news from the SIMD integration into the Microsoft .Net CLR as well as improving the JIT code generation for all the platforms?
  • 2/3/2011 1:25 PM In reply to

    Re: Writing SIMD Code With C# (but not XNA)

    JIT code generation improved a lot on 3.5 SP1 (inline of methods with value type parameters). Not sure there have been any changes in the CF, info on this subject is surprisingly hard to find :(
  • 2/10/2011 7:14 PM In reply to

    Re: Writing SIMD Code With C# (but not XNA)

    Damn, two years later and I've completely forgotten about this thread.  I have not been following the XNA Game Studio (or whatever it's called now-a-days) scene for awhile now.  In fact, this is the first time I've logged-in to post to the new forum (conveniently linked from forums.xna.com).

    Seriously, seeing this post come up and realize its from over 2 years ago makes me feel old... thanks!


    VicenteJade:
    JIT code generation improved a lot on 3.5 SP1 (inline of methods with value type parameters). Not sure there have been any changes in the CF, info on this subject is surprisingly hard to find :(


    This leads to one of the most disappointing aspects of the .NET Framework, to me.  Specifically, a general lack of information about the "behind the scenes" inner-workings of the JIT compilers.  When XNA Game Studio 1.0 came out, there were a few good blog posts from the compiler engineers that explained things like floating-point performance and why it is the way it is on the CF on Xbox.  Fast forward several years and look at XNA Game Studio 4.0.  There has been virtually no additional information given during that time.  Nor is there any feasible way to do performance analysis on the Xbox.  Having a "best practices" document can only take you so far.

    I'm really not trying to start anything by saying this, but the lack of introspection into generated code is one of the main reasons I have not touched XNA code in over a year.  When I write C++ code for OS X, Windows, Linux, Android, or iOS, I have tools available that I can use for both profiling and looking directly at generated assembly code.  I can answer questions like:  (a) where are my hotspots?; (b) am I fully utilizing the processors' memory hierarchy and SIMD units?; (c) how well of a job is the instruction scheduler doing to reduce hazards?  Not to mention that these other platforms allow full utilization of the hardware.  Do I need more floating-point through-put for my physics solver on my phone?  If so, I write NEON/VFP code.  How about on my desktop?  If so, I write SSE/AVX code.  Xbox?  Sorry, the beefy Altivec unit cannot be accessed through managed code.  Grr...

    On Xbox with XNA, it's always a shot in the dark without a better understanding of how the compiler actually works.
Page 1 of 1 (12 posts) Previous Discussion Next Discussion