I am trying to obtain feature vectors for N =~ 1300 images in my data set, one of the features I have to implement is shape. So I plan to use SIFT descriptors. However, each image returns different number of keypoints, so I run
[F,D] = vl_sift(image);
F is of size 4 x N
and D is of size 128 x N
where N is the number of keypoints detected.
However, I want to obtain a single vector of size 128 x 1
that can represent an image as good as possible. I have seen things like clustering and k-means, but I don't have any idea how to do them.
The most basic idea is to get the average of these N vectors of size 128x1, then I have a feature vector. But is taking the average meaningful? Should I do some kind of histogram?
Any help will be appreciated. Thanks !