假设我有一个包含 3 个字段的数据库表:字符串标题、int A、int B。A 和 B 的范围都是 1 到 500。
我想将部分值表示为 5x5 矩阵。这样 (1, 1) 将是 A 和 B 都最低的字符串;(5, 5) 将具有最高的 A 和 B;(1, 5) 将具有最低的 A 和最高的 B。依此类推。
我应该使用哪种算法?
问问题
1046 次
2 回答
1
你有
title A B
one 1 1
two 1 2
three 2 1
four 3 3
five 4 4
six 5 5
seven 5 1
eight 1 5
等等...?
减少到一个 3x3 矩阵,它看起来像
a/b 1 2 3
1 one two eight
2 three four ?
3 seven ? six
问题是,(2,2) 指向什么?平均值?好的,在 5x5 矩阵中?您的定义缺少一些信息。
上述矩阵的算法将是:
- 为 A 和 B 计算 min、max、avg
- 向数据库询问元组 (Amin, Bmin), (Aavg, Bmin), (Amax, Bmin) 等等
- 将值填充到矩阵中
附加:如果没有匹配项,请尝试最小值、最大值和平均值的范围。
于 2012-04-21T10:22:00.837 回答
1
我在这里设置了一个模拟,评论将描述这些步骤。
首先我生成一些数据:一系列元组,每个元组包含一个字符串和两个随机数,分别代表分数 A 和 B。
接下来,我将 A 和 B 的范围划分为五个等距的 bin,每个 bin 代表一个单元格的最小值和最大值。
然后我串行查询数据集以提取每个单元格中的字符串。
根据您使用的实际数据结构和存储,有一百种优化方法。
from random import random
# Generate data and keep record of scores
data = []
a_list = []
b_list = []
for i in range(50):
a = int(random()*500)+1
b = int(random()*500)+1
rec = { 's' : 's%s' % i,
'a' : a,
'b' : b
}
a_list.append(a)
b_list.append(b)
data.append(rec)
# divide A and B ranges into five bins
def make_bins(f_list):
f_min = min(f_list)
f_max = max(f_list)
f_step_size = (f_max - f_min) / 5.0
f_steps = [ (f_min + i * f_step_size,
f_min + (i+1) * f_step_size)
for i in range(5) ]
# adjust top bin to be just larger than maximum
top = f_steps[4]
f_steps[4] = ( top[0], f_max+1 )
return f_steps
a_steps = make_bins(a_list)
b_steps = make_bins(b_list)
# collect the strings that fit into any of the bins
# thus all the strings in cell[4,3] of your matrix
# would fit these conditions:
# string would have a Score A that is
# greater than or equal to the first element in a_steps[3]
# AND less than the second element in a_steps[3]
# AND it would have a Score B that is
# greater than or equal to the first element in b_steps[2]
# AND less than the second element in a_steps[2]
# NOTE: there is a need to adjust the pointers due to
# the way you have numbered the cells of your matrix
def query_matrix(ptr_a, ptr_b):
ptr_a -= 1
from_a = a_steps[ptr_a][0]
to_a = a_steps[ptr_a][1]
ptr_b -= 1
from_b = b_steps[ptr_b][0]
to_b = b_steps[ptr_b][1]
results = []
for rec in data:
s = rec['s']
a = rec['a']
b = rec['b']
if (a >= from_a and
a < to_a and
b >= from_b and
b < to_b):
results.append(s)
return results
# Print out the results for a visual check
total = 0
for i in range(5):
for j in range(5):
print '=' * 80
print 'Cell: ', i+1, j+1, ' contains: ',
hits = query_matrix(i+1,j+1)
total += len(hits)
print hits
print '=' * 80
print 'Total number of strings found: ', total
于 2012-04-21T11:16:43.417 回答