2

我正在尝试构建一个内核来进行并行字符串搜索。为此,我倾向于使用有限状态机。fsm 的转换表处于内核参数状态。编码:

 __kernel void Find ( __constant char *text,
        const      int offset,
        const      int tlenght,
        __constant char *characters,
        const int clength,
        const int maxlength,
        __constant int *states,
        const int statesdim){

    private char c;
    private int state;
    private const int id = get_global_id(0);

    if (id<(tlenght-maxlength)) {

        private int cIndex,sd,s,k;

        for (int i=0; i<maxlength; i++) {

            c = text[i+offset];

            cIndex = -1;

            for (int j=0; j<clength; j++) {

                if (characters[j]==c) {
                    cIndex = j;
                }       
            }    

            if (cIndex==-1) {

                state = 0;
                break;

            }  else {

                s = states[state+cIndex*statesdim];

            }

            if (state<=0) break;

        }    
    }
}   

如果我使用 iocgui 编译这个内核,我会得到结果:

Using default instruction set architecture.
Intel OpenCL CPU device was found!
Device name: Pentium(R) Dual-Core CPU       T4400  @ 2.20GHz
Device version: OpenCL 1.1 (Build 31360.31426)
Device vendor: Intel(R) Corporation
Device profile: FULL_PROFILE
Build started
Kernel <Find> was successfully vectorized
Done.
Build succeeded!

当我更改确定新状态的行时:

state = states[state+cIndex*statesdim];

结果是:

Using default instruction set architecture.
Intel OpenCL CPU device was found!
Device name: Pentium(R) Dual-Core CPU       T4400  @ 2.20GHz
Device version: OpenCL 1.1 (Build 31360.31426)
Device vendor: Intel(R) Corporation
Device profile: FULL_PROFILE
Build started
Kernel <Find> was not vectorized
Done.
Build succeeded!
4

1 回答 1

1

该声明

X = states[state+cIndex*statesdim];

不能向量化,因为索引不一定评估跨线程对后续字节的访问。

请注意,在您的第一个内核中,您的目标变量s尚未写回全局内存。因此,编译器可能会优化代码并删除s = states[state+cIndex*statesdim];语句。因此,看起来您的陈述已被矢量化,但事实并非如此。

于 2012-10-21T07:06:56.153 回答