The PTX manual (version 2.3) (http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/ptx_isa_2.3.pdf) 6.4.2 states:
Array elements can be accessed using an explicitly calculated byte address, or by indexing into the array using square-bracket notation. The expression within square brackets is either a constant integer, a register variable, or a simple “register with constant offset” expression, where the offset is a constant expression that is either added or subtracted from a register variable. If more complicated indexing is desired, it must be written as an address calculation prior to use.
ld.global.u32 s, a[0];
ld.global.u32 s, a[N-1];
mov.u32 s, a[1]; // move address of a[1] into s
When I try this I can only get the version pointer plus byte offset to work, i.e. [a+0]
.
This code fails to load:
.reg .f32 f<1>;
.global .f32 a[10];
ld.global.f32 f0,a[0];
Whereas this loads fine:
.reg .f32 f<1>;
.global .f32 a[10];
ld.global.f32 f0,[a+0];
The problem with the byte offset version is that it really is a byte offset. So, one has to take the underlying size of the type into account, i.e. the second element is [a+4]
. Whereas a[1]
is supposed to work this out for you.
Ideas what's going wrong?
EDIT
And there is an even more severe issue here involved: The above text states that a register variable can be used to index the array, like:
ld.global.f32 f0,a[u0];
where u0
is probably a .reg.u32
or some other compatible integer.
However, with the pointer plus byte offset method this is not possible. It is illegal to do something like:
mul.u32 u1,u0,4;
ld.global.f32 f0,[a+u1]; // here a reg variable is not allowed.
Now this is a severe limitation. however, one can do another address calculation prior to the load statement. But this complicates things.