This is actually a broad question with a myraid of approaches, but the basic one is that you need a few things to build a solid tracker:
- a good input (moving camera)
- a good feature / region descripter (you chose haar wavelet based method)
- a good comparison metric for discripters (don't know what you are doing here)
generically you need to stabilize your input through an image registration process
reference your library of discripters
calculate a metric of those discripters against your registered input image
reducing false positives is dependent upon your approach, but typically you group together the false positives and find out where in the algorithm they are originating from the most (many will originate at different places) and then change that part of the algorithm to cull them better.
your options are:
- register the images if they aren't already (this will remove the false positives from relative motion)
- examine the algorithm to identify the largest source of false positives and modify it accordingly
- get better training data to match against (if your training data doesn't match the real world data this will never work)
- try a different approach (SIFT,SURF, GLO, Fourier Mellin Transform etc.)
that's as specific as I can be given your provided information. I hope it helps.