支持是个好主意Unchecked.defaultof
,我会检查这个,谢谢。
目前有两种方法可以获取 type 的默认值'T
:
使用Alea.CUDA.Intrinsic.__default_value<'T>()
(见这里)。Intrinsic
是一个自动打开的模块,所以如果你打开了Alea.CUDA
命名空间,你可以直接__default_value()
在你的代码中使用。
第二种方法是打开命名空间Alea.CUDA.Utilities
,并使用自动打开的NumericLiteralG
模块(见这里),然后在你的内联泛型函数中你可以直接写像0G
,1G
等等。
对于您的第二个问题,我粘贴了一些 helper warp shuffle 类型的源代码,其中包括广播使用。这些辅助静态方法在模块中可用Alea.CUDA.Intrinsic
:
///A helper static class providing shuffle instructions.
[<AbstractClass;Sealed>]
type WarpShuffle private () =
[<ReflectedDefinition>]
static member Broadcast(input:'T, srcLane:int, width:int) =
__shfl input srcLane width
[<ReflectedDefinition>]
static member Broadcast(input:'T, srcLane:int) =
let width = __warp_size()
__shfl input srcLane width
[<ReflectedDefinition>]
static member Up(input:'T, delta:int, width:int) =
__shfl_up input delta width
[<ReflectedDefinition>]
static member Up(input:'T, delta:int) =
let width = __warp_size()
__shfl_up input delta width
[<ReflectedDefinition>]
static member Down(input:'T, delta:int, width:int) =
__shfl_down input delta width
[<ReflectedDefinition>]
static member Down(input:'T, delta:int) =
let width = __warp_size()
__shfl_down input delta width
[<ReflectedDefinition>]
static member Xor(input:'T, laneMask:int, width:int) =
__shfl_xor input laneMask width
[<ReflectedDefinition>]
static member Xor(input:'T, laneMask:int) =
let width = __warp_size()
__shfl_xor input laneMask width
///[omit]
[<AbstractClass;Sealed>]
type FullWarpShuffle private () =
[<ReflectedDefinition>]
static member Broadcast(input:'T, srcLane:int, logicWarpThreads:int) =
let shflC = logicWarpThreads - 1
__shfl_raw input srcLane shflC
[<ReflectedDefinition>]
static member Broadcast(input:'T, srcLane:int) =
let shflC = __warp_size() - 1
__shfl_raw input srcLane shflC
[<ReflectedDefinition>]
static member Up(input:'T, srcOffset:int) =
let shflC = 0
__shfl_up_raw input srcOffset shflC
[<ReflectedDefinition>]
static member Down(input:'T, srcOffset:int) =
let shflC = __warp_size() - 1
__shfl_down_raw input srcOffset shflC
[<ReflectedDefinition>]
static member Down(input:'T, srcOffset:int, warpThreads:int) =
let shflC = warpThreads - 1
__shfl_down_raw input srcOffset shflC
在上面它使用的代码__shf_raw
中,在线文档已经过时了。这是 ptx 代码的原始版本shfl.idx
,其中shflC
包含两个打包值,指定一个掩码,用于在逻辑上将扭曲分成子段,以及一个用于钳制源通道索引的上限。在这里阅读更多。