I have extended this SO question & is comparing two latex equations. Here is two quadratic equation's example.
eqn1 = "*=\frac{-*\pm\sqrt{*^2-4ac}}{2a}"
eqn2 = "x=\frac{-b\pm\sqrt{b^2-4ac}}{2a}"
I need to compare these as correct, because, instead of x, b, I have use * for that. All I am doing is converting equations to word list.
eqn1_word = [*,frac,*,pm,sqrt,*,2,4ac,2a]
eqn2_word = [x,frac,b,pm, sqrt, b, 2, 4ac, 2a]
so the vector is
eqn1_vec= Counter({'*': 3, 'frac': 1, 'sqrt': 1, '2': 1, '2a': 1, '4ac': 1, 'pm': 1})
eqn2_vec = Counter({'b': 2, 'frac': 1, 'sqrt': 1, '2': 1, '2a': 1, '4ac': 1, 'x': 1, 'pm': 1})
Now my extension is I am checking the percentage of * in eqn1_word, then check with normal cosine similarity as given by that answer. At last, I am adding two values, which has to nearly equal to 1.
This works fine for most of scenario(if one variable is replaced by *). Here is * value is 3 for eqn1_vec, and in eqn2_vec b = 2, x=1.
For more description & better understanding please check this. From that reference, my code is like this.
def get_cosine(self, c_eqn1_eqn, c_eqn2_eqn):
print 'c_eqn1_eqn = ', c_eqn1_eqn
print 'c_eqn2_eqn = ', c_eqn2_eqn
_special_symbol = float(c_eqn1_eqn.count("*"))
cos_result = 0
symbol_percentage = 0
try:
eqn1_vector = Counter(self.get_word(c_eqn1_eqn))# get word will return word list
eqn2_vector = Counter(self.get_word(c_eqn2_eqn))
_words = sum([x for x in eqn1_vector.values()])
if eqn2_vector.has_key("*"):
_special_symbol -= eqn2_vector["*"]
print '_special_symbol = ', _special_symbol
print '_words @ last = ', _words
try:
symbol_percentage = _special_symbol / _words
except ZeroDivisionError:
symbol_percentage = 0.0
except Exception as exp:
print "Exception at converting equation to vector", exp
traceback.print_exc()
else:
intersection = set(eqn1_vector.keys()) & set(eqn2_vector.keys())
numerator = sum([eqn1_vector[x] * eqn2_vector[x] for x in intersection])
_sum1 = sum([eqn1_vector[x]**2 for x in eqn1_vector.keys()])
_sum2 = sum([eqn2_vector[x]**2 for x in eqn2_vector.keys()])
denominator = math.sqrt(_sum1) * math.sqrt(_sum2)
print 'numerator = ', numerator
print 'denominator = ', denominator
if not denominator:
cos_result = 0
else:
cos_result = float(numerator) / denominator
print cos_result
final_result = float(symbol_percentage) + cos_result
return final_result if final_result <= 1.0 else 1
The problem is numerator is getting small as intersection value is small. I have copied from my class. please ignore self.
How to solve this. Thanks in advance. If there is any mistake in question or my concept is wrong, please share with me.