我现在正在阅读 CUDA 5.0 示例(AdvancedQuickSort)。但是,由于以下代码,我无法完全理解此示例:
// Now compute my own personal offset within this. I need to know how many
// threads with a lane ID less than mine are going to write to the same buffer
// as me. We can use popc to implement a single-operation warp scan in this case.
unsigned lane_mask_lt;
asm( "mov.u32 %0, %%lanemask_lt;" : "=r"(lane_mask_lt) );
unsigned int my_mask = greater ? gt_mask : lt_mask;
unsigned int my_offset = __popc(my_mask & lane_mask_lt);
它在__global__ void qsort_warp
函数中,特别是对于代码中的这种汇编语言。谁能帮我解释一下这种汇编语言的含义?