The code below (to compute cosine similarity), when run repeatedly on my computer, will output 1.0, 0.9999999999999998, or 1.0000000000000002. When I take out the normalize function, it will only return 1.0. I thought floating point operations were supposed to be deterministic. What would be causing this in my program if the same operations are being applied on the same data on the same computer each time? Is it maybe something to do with where on the stack the normalize function is being called? How can I prevent this?
#! /usr/bin/env python3
import math
def normalize(vector):
sum = 0
for key in vector.keys():
sum += vector[key]**2
sum = math.sqrt(sum)
for key in vector.keys():
vector[key] = vector[key]/sum
return vector
dict1 = normalize({"a":3, "b":4, "c":42})
dict2 = dict1
n_grams = list(list(dict1.keys()) + list(dict2.keys()))
numerator = 0
denom1 = 0
denom2 = 0
for n_gram in n_grams:
numerator += dict1[n_gram] * dict2[n_gram]
denom1 += dict1[n_gram]**2
denom2 += dict2[n_gram]**2
print(numerator/(math.sqrt(denom1)*math.sqrt(denom2)))