As of April 2013, Steam Survey reports that only 64% of PCs have support for SSE4.1. In other words, if you assume SSE4.1 support, you'll crash on about a third of all consumer PCs.
I am not familiar with Mono.Simd, but a good alternative on Windows is DirectXMath, if you can be bothered to write a suitable C++/CLI wrapper. Neither will take advantage of all the latest instructions, but you can supplement these on a need-to basis relatively easily with intrinsics. I'm not sure you'll be able to do significantly better than Mono.Simd with it though.
There is no such thing as "inline assembly" in C#; if you want to use C++ or assembly code from C#, you'll have to call it via P/Invoke or a C++/CLI wrapper. Out of the two, C++/CLI has less overhead.
That said, if you need to optimize the hell out of a small piece of code, the best option might be to rewrite that piece of code entirely in native C++.