嗯,这在计算上是昂贵的。我会采取的方法是在字段上进行自连接,以查看每个字段之间的重叠:
select x.name, count(*)
from x cross join
x x2
where left(x.name, length(x.name)) = left(x2.name, length(x.name))
group by x.name
order by count(*) desc
我注意到“John”而不是“8”的计数是 7。我怀疑您不想匹配“Johny”。为此,让我们添加一个附加条款:
select x.name, count(*)
from x cross join
x x2
where left(x.name, length(x.name)) = left(x2.name, length(x.name)) and
(length(x.name = x2.name) or substr(x2.name, length(x.name)+1, 1) = ' ')
group by x.name
order by count(*) desc
为此,它假定您在数据中拥有“最短”版本。因此,如果“John”不是数据行,它将不会寻找“John”。