Suppose we have an array with cluster effect to some degree, such as
1 2 3 7 8 12 13 16 20 21 22 23
how do we represent this kind of data mathematically ? If we have the other array like this
1 2 10 11 20 21
the intersection of these two array is
1 2 20 21
Noted that we are in the situation that we have an fully paralleled algorithm to calculate the intersection of two arrays of this kind, we want to analyze the cost in math convention. The algorithm is about binary search every element of the short array in the longer one.
We designed some algorithm for GPU, which is quite fast. We find that the algorithm is faster on the data with these kind of cluster effect. Now we want to analyze our algorithm on these kind of data, but we have no idea to do this.
Is there something like random process or anything else can provide help to calculate the expectation of the cost ?