I searched for two days trying to find an answer to the same issue you had. The answer is you need to set the garbage collection mode to Server mode. By default, garbage collection mode set to Workstation mode.
Setting garbage collection to Server mode causes the managed heap to split into separately managed sections, one-per CPU.
To do this, you need to add a config setting to your app.config file.
<runtime>
<gcServer enabled="true"/>
</runtime>
The speed difference on my 12-core Opteron 6172 was dramatic!