When using the Hough transform, you create a signature storing the displacement vectors of every feature from the template centroid (either (w/2,h/2)
or with the help of central moments).
E.g. for 10 SIFT features found on the template, their relative positions according to template's centroid is a vector<{a,b}>
. Now, let's search for this object in a query image: every SIFT feature found in the query image, matched with one of template's 10, casts a vote to its corresponding centroid.
votemap(feature.x - a*, feature.y - b*)+=1
where a,b corresponds to this particular feature vector.
If some of those features cast successfully at the same point (clustering is essential), you have found an object instance.
Signature and voting are reverse procedures. Let's assume V=(-20,-10)
. So during searching in the novel image, when the two matches are found, we detect their orientation and size and cast a respective vote. E.g. for the right box centroid will be V'=(+20*0.5*cos(-10),+10*0.5*sin(-10))
away from the SIFT feature because it is in half size and rotated by -10 degrees.