We have an algorithm which is performing poorly and we believe it's because of CPU cache misses. Nevertheless, we can't prove it because we don't have any way of detecting them. Is there any way to tell how many CPU cache misses an algorithm produces? We can port it to any language which could allow us to detect them.
Thanks in advance.